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Preface 


The preface to the first edition of this text explained our mission as follows: 

This textbook is organized around the principle that much of actuarial 
science cousists of the construction and analysis of mathematical models which 
describe the process by which funds flow into and out of an insurance system. 
An analysis of the entire system is beyond the scope of a single text, so we 
have concentrated our efforts on the loss process, that is, the outflow of cash 
due to the payment of benefits. 

We have not assumed that the reader has any substantial kowi of 
insurance systems. Insurance terms are defined when they are first used. In 
fact, most of the material could be disassociated from the insurance process 
altogether, and this book could be just another applied statistics text. What 
we have done is kept the examples focused on insurance, presented the mate- 
rial in the language and context of insurance, and tried to avoid getting into 
statistical methods that would have little use in actuarial practice. 

In particular, the first edition of this text was published in 1998 to achieve 
three goals: 


1. Update the distribution fitting material from Loss Distributions [59] by 
Robert Hogg and Stuart Klugman, published in 1984. 


2. Update material on discrete distributions and collective risk model cal- 
culations from Insurance Risk Models [106] by Harry Panjer and Gordon 
Willmot, published in 1992. 
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3. Integrate the material the three authors had developed for the Society 
of Actuaries’ Intensive Seminar 152, Applied Risk Theory. 


Shortly after publication, the Casualty Actuarial Society and the Society of 
Actuaries altered their examination syllabus to include our first edition. The 
good news was that the first edition was selected as source material for the 
new third and fourth examinations. The bad news was that the subject matter 
was split between the examinations in a manner that was not compatible with 
the organization of the text. By itself, that is sufficient reason to produce a 
revision. But there are others. 


1. The first edition was written with an assumption that readers would 
be familiar with the subject of mathermatical statistics. This had been 
part of the actuarial examination process at the time the book was 
written but was subsequently removed. Some background material on 
mathematical statistics is now presented in Chapter 9. 


2. For a long time, actuarial education has included the subject of survival 
models. This is the study of determining probability models for time 
to death, failure, or disability. It is not much different from the study 
of determining probability models for the amount or number of claims. 
This edition integrates that subject and in doing so adds an emphasis 
on building empirical models. This is covered in Chapters 10 and 11. 


3. There were two items that had been removed from the actuarial syl- 
labus over the years that we wanted to see returned, at least in brief. 
One is graduation, the smoothing of and interpolation of sequences of 
numbers. This is covered in Chapter 15. The other is the adjustment 
of estimation formulas when dealing with the large amounts of data in 
mortality studies. This is covered in Section 11.4. 


4. While simulation was briefly covered in the first edition, the material 
has been slightly expanded and now appears in Chapter 17. 


With regard to continuing material, besides the rearrangment of the mater- 
ial on models and modeling, other substantive changes are a significant rewrite 
of the ruin theory material (Chapters 7 and 8), and a better explanation of 
the limited fluctuation credibility formulas (Chapter 16). 

While we have attempted to integrate the material into a single, logical 
development of actuarial model building, various sections stand alone, thus 
the division of the text into various parts. 

Since the publication of the first edition, computational power continues 
to increase. For that edition, specialized DOS programs were made available 
for obtaining maximum likelihood estimates and performing aggregate loss 
calculations. Those programs continue to be available at the Wiley ftp site: 

ftp://ftp.wiley.com/public/sci_tech_med/loss_models/ 
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In addition, files containing the data for examples and exercises are also 
available. However, it is more likely that users will be calculating using a 
spreadsheet program such as Microsoft Excel®.! At various places in the text 
we indicate how Excel® commands may help. This is not an endorsement by 
the authors, but rather a recognition of the pervasiveness of this tool. 

As in the first edition, many of the exercises are taken from examinations 
of the Casualty Actuarial Society and the Society of Actuaries. They have 
been reworded to fit the terminology and notation of this text and the five 
answer choices from the original questions are not provided. Such exercises 
are indicated with an asterisk (*). Of course, these questions may not be 
representative of those asked on examinations given in the future. 

Finally, a word about our cover picture. In the summer of 1993 the Des 
Moines and Raccoon Rivers flooded, putting sections of Des Moines, Iowa 
under water. At the left center of the picture is the Des Moines Water 
Works. Contamination knocked out the water supply for 12 days. While 
living through adverse events can be interesting, it is not necessary to do so 
to build probability models for their occurrence, timing, and severity. Our 
thanks to Melissa Sharer of the Des Moines Water Works for providing the 
picture. 


S. A. KLUGMAN 
H. H. PANJER 
G. E. WILLMOT 


Des Moines, Iowa 
Waterloo, Ontario 


l Microsoft® and Excel® are either registered trademarks or trademarks of Microsoft Cor- 
poration in the United States and/or other countries. 
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Part I 


Introduction 


Modeling 


1.1 THE MODEL-BASED APPROACH 


The model-based approach should be considered in the context of the ob- 
jectives of any given problem. Many problems in actuarial science involve 
the building of a mathematical model that can be used to forecast or predict 
insurance costs in the future. 

A model is a simplified mathematical description which is constructed based 
on the knowledge and experience of the actuary combined with data from the 
past. The data guide the actuary in selecting the form of the model as well 
as in calibrating unknown quantities, usually called parameters. The model 
provides a balance between simplicity and conformity to. the available data. 

The simplicity is measured in terms of such things as the number of un- 
known parameters (the fewer the simpler); the conformity to data is measured 
in terms of the discrepancy between the data and the model. Model selection 
is based on a balance between the two criteria, namely, fit and simplicity. 


1.1.1 The modeling process 


The modeling process is illustrated in Figure 1.1, which describes six stages. 
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MODELING 


Experience and 
Prior Knowledge 


Stage 1 
Model Choice 


Stage 2 Stage 3 
Model Calibration Model Validation 


Stage 5 
Model Selection 


Stage 6 
Modify for Future 


Fig. 1.1 The modeling process. 


Stage 1 One or more models are selected based on the analyst’s prior knowl- 
edge and experience and possibly on the nature and form of available 
data. For example, in studies of mortality, models may contain covariate 
information such as age, sex, duration, policy type, medical information, 
and lifestyle variables. In studiés of the size of insurance loss, a statis- 
tical distribution (e.g., lognormal, gamma, or Weibull) may be chosen. 


Stage 2 The model is calibrated based on available data. In mortality stud- 
ies, these data may be information on a set of life insurance policies. In’ 


studies of property claims, the data may be information about each of 
a set of actual insurance losses paid under a set of property insurance 
policies. 


Stage 3 The fitted model is validated to determine if it adequately conforms 


to the data. Various diagnostic tests can be used. These may be well- 
known statistical tests, such as the chi-square goodness-of-fit test or the 
Kolmogorov-Smirnov test, or may be more qualitative in nature. The 
choice of test may relate directly to the ultimate purpose of the modeling 
exercise. In insurance-related studies, the fitted model is often required 
to replicate in total the losses actually experienced in the data. In 
insurance practice this is often referred to as unbiasedness of a model. 
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Stage 4 An opportunity is provided to consider other possible models. This 


is particularly useful if Stage 3 revealed that all models were inade- 
quate. It is also possible that more than one valid model will be under 
consideration at this stage. 


Stage 5 All valid models considered in Stages 1-4 are compared using some 


criteria to select between them. This may be done by using the test 
results previously obtained or may be done by using another criterion. 
Once a winner is selected, the losers may be retained for sensitivity 
analyses. i 


m 


Stage 6 Finally, the selected model is adapted for application to the future. 


Tbis could involve adjustment of parameters to reflect anticipated in- 
flation from the time the data were collected to the period of time to 
which the model will be applied. 


As new data are collected or the environment changes, the six stages will 


need to be repeated to improve the model. 


1.1.2 The modeling advantage 


Determination of the advantages of using models requires us to consider the 
alternative, decision making based strictly upon empirical evidence. The em- 
pirical approach assumes that the future can be expected to be exactly like a 
sample from the past, perhaps adjusted for trends such as inflation. Consider 
the following illustration. 


Example 1.1 A portfolio of group life insurance certificates consists of 1,000 
employees of various ages and death benefits. Over the past five years, 14 em- 
ployees died and received a total of 580,000 in benefits (adjusted for inflation 
because the plan relates benefits to salary). Determine the empiricai estimate 
of next year’s expected benefit payment. 


The empirical estimate for next year is then 116,000 (one-fifth of the total), 


which would need to be further adjusted for benefit increases. The danger, of 
course, is that it is unlikely that the experience of the past five years accurately 
reflects the future of this portfolio as there can be considerable fluctuation in 
such short-term results. 0 


It seems much more reasonable to build a model, in this case a mortality 


table. This table would be based on the experience of many lives, not just the 
1,000 in our group. With this model we not only can estimate the expected 
payment for next year, but we can also measure the risk involved by calculating 
the standard deviation of payments or perhaps various percentiles from the 
distribution of payments. This is precisely the problem covered in great detail 
in Actuarial Mathematics [16]. 


6 MODELING 


This approach was codified by the Society of Actuaries Committee on Ac- 
tuarial Principles. In the publication “Principles of Actuarial Science” [123], 
p. 571, Principle 3.1 states that “Actuarial risks can be stochastically modeled 
based on assumptions regarding the probabilities that will apply to the actu- 
arial risk variables in the future, including assumptions regarding the future 
environment.” The actuarial risk variables referred to are occurrence, timing, 
and severity—that is, the chances of a claim event, the time at which the 
event occurs if it does, and the cost of settling the claim. 


-1.2 ORGANIZATION OF THiS BOOK 


This text takes the reader through the modeling process, but not in the order 
presented in the previous section. There is a difference between how models 
are best applied and how they are best learned. In this text we first learn 
about the models and how to use them. This is followed by instruction in how 
to determine which model to use. The reason is that it is difficult to select 
models in a vacuum. Unless the analyst has a thorough knowledge of the 
set of available models, it is difficult to narrow the choice to the ones worth 
considering. With that in mind, the organization of the text is as follows: 


1. Review of probability—Almost by definition, contingent events imply 
probability models. Chapters 2 and 3 review random variables and some 
of the basic calculations that may be done with such models, including 
moments and percentiles. 


2. Understanding probability distributions—In order to select a probability 
model, the analyst should possess a reasonably large collection of such 
models. In addition, in order to make a good a priori model choice, 
characteristics of these models should be available. In Chapter 4 a 
variety of distributional models are introduced and their characteristics 
explored. This includes both continuous and discrete distributions. 


3. Coverage modifications—Insurance contracts often do not provide full 
payment. For example, there may be a deductible (the insurance policy 
does not pay the first $250, for example) or a limit (the insurance policy 
does not pay more than $10,000 for any one loss event, for example). 
Suck modifications alter the probability distribution and affect related 
calculations such as moments. Chapter 5 shows how this is done. 


4. Aggregate losses and ruin—To this point the models are either for the 
amount of a single payment or for the number of payments. Of interest 
when modeling .a portfolio, line of business, or entire company is the 
total amount paid. A model that combines the probabilities concerning 
the number of payments and the amounts of each payment is called 
an aggregate loss model. Calculations for such models are covered in 


10. 
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Chapter 6. Usually, the payments arrive sequentially through time. It 
is possible, if the payments turn out to be large, that at some point 
the entity will run out of money. This state of affairs is called ruin. In 
Chapters 7 and 8 models are established that allow for the calculation 
of the probability this will happen. 


. Review of mathematical statistics—Because most of the models being 


considered are probability models, techniques of mathematical statistics 
are needed to estimate model specifications and make choices. While 
Chapter 9 is not a replacement for a thorough text or‘course in mathe- 
matical statistics, it does contain the essential items needed later in this 
text. 


. Construction of empirical models—Sometimes it is appropriate to work 


with the empirical distribution of the data. It may be because the vol- 
ume of data is sufficient or because a good portrait of the data is needed. 
Chapters 10 and 11 cover this for the simple case of straightforward data, 
adjustments for truncated and censored data, and modifications suitable 
for large data sets, particularly those encountered in mortality studies. 


. Construction of parametric models—Often it is valuable to smooth the 


data and thus represent the population by a probability distribution. 
Chapter 12 provides methods for parameter estimation for the models 
introduced earlier. Model selection is covered in Chapter 13. 


. Chapter 14 contains examples that summarize and integrate the topics 


discussed to this point. 


. Adjustment of estimates—At times, further adjustment of the results 


is needed. Two such adjustments are covered in this text. The first’ is 
interpolation and smoothing (also called graduation) and is covered in 
Chapter 15, where the emphasis is on cubic splines. There are situa- 
tions, such as time to death, where no simple probability distribution is 
known to describe the observations. An empirical approach is likely to 
produce results that are not as smooth as the population is known to 
be. Graduation methods can provide the needed adjustment. A second 
situation occurs when there are one or more estimates based on a small 
number of observations. Accuracy could be improved by adding other, 
related observations, but care must be taken if the additional data are 
from a different population. Credibility methods, covered in Chapter 
16, provide a mechanism for making the appropriate adjustment when 
additional data are to be included. 


Simulation—When analytical results are difficult to obtain, simulation 
(use of random numbers) may provide the needed answer. A brief in- 
troduction to this technique is provided in Chapter 17. 


Part II 


Actuarial models 


o -o 


Random variables 


2.1 INTRODUCTION 


An actuarial model is an attempt to represent an uncertain stream of future 
payments. The uncertainty may be with respect to any or all of occurrence (is 
there a payment?), timing (when is the payment made?), and severity (how 
much is paid?). Because the most useful means of representing uncertainty 
is through probability, we concentrate on probability models. In all cases, 
the relevant probability distributions are assumed to be known. Determin- 
ing appropriate distributions is covered in Chapters 10-13. In this part, the 
following aspects of actuarial probability models will be covered: 


Definition of random variable, important functions, and some examples. 
Basic calculations from probability models. 
. Specific probability distributions and their properties. 


. More advanced calculations using severity models. 


a p w Ņ rH 


. Models incorporating the possibility of a random number of payments 
each of random amount. 


6. Models that track a company’s surplus through time. 


Loss Models: From Data to Decisions, Second Edition. 
By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 
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There are two important models that are absent from this text. The first 
is a model for investment earnings in future years. While such techniques are 
within the scope of this text, the additional finance background needed to 
motivate these models would detract from the primary purpose of this text. 
The second is a model that combines the earning of interest-(whether random 
or not) with the timing of the payment. While simple models of this type 
may show up in examples, thorough coverage of the use of these models for 
life insurance and annuities is sufficiently specialized to require a separate 
text, such as [16]. 

The commonality we seek here is that all models for random phenomena 

. have similar elements. For each, there is a set of possible outcomes. The 
particular outcome that occurs will determine the success of our enterprise. 
Attaching probabilities to the various outcomes allows us to quantify our 
expectations and the risk of not meeting them. In this spirit, the underlying 
random variable will almost always be denoted X or Y. The context will 
provide a name and some likely characteristics. Of course, there are actuarial 
models that do not look like those covered here. For example, in life insurance 
a model office is a list of cells containing policy type, age range, gender, and 
so on. 

To expand on this concept, consider the following definitions from the latest 
working draft of “Joint Principles of Actuarial Science”!: 


Phenomena are occurrences that can be observed. An experiment is 
an observation of a given phenomenon under specified conditions. The 
result of an experiment is called an outcome; an event is a set of one or 
more possible outcomes. A stochastic phenomenon is a phenomenon for 
which an associated experiment has more than one possible outcome. 
An event associated with a stochastic phenomenon is said to be contin- 
gent. Probability is a measure of the likelihood of the occurrence of an 
event. It is measured on a scale of increasing likelihood from zero to 
one. A random variable is a function that assigns a numerical value to 
every possible outcome. 


The following list contains a number of random variables encountered in 
actuarial work: 


1. The age at death of a randomly selected birth. (Model 1) 


2. The time to death from when insurance was purchased for a randomly 
selected insured life. 


3. The time from occurrence of a disabling event to recovery or death for 
a randomly selected workers compensation claimant. 


1This document is a work in progress of a joint committee from the Casualty Actuarial 
Society and the Society of Actuaries. Key principles are that models exist that represent 
actuarial phenomena and that given sufficient data it is possible to calibrate models. 
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4. The time from the incidence of a randomly selected claim to its being 
reported to the insurer. 


5. The time from the reporting of a randomly selected claim to its settle- 
ment. 


6. The number of dollars paid on a randomly selected life insurance claim. 


7. The number of dollars paid on a randomly selected automobile bodily 
injury claim. (Model 2) 


8. The number of automobile bodily injury claims in one year from a ran- 
domly selected insured automobile. (Model 3) 


9. The total dollars in medical malpractice claims paid in one year owing 
to events at a randomly selected hospital. (Model 4) 


10. The time to default or prepayment on a randomly selected insured home 
loan that terminates early. 


11. The amount of money paid at maturity on a randomly selected high- 
yield bond. 


Because all of these phenomena can be expressed as random variables, the 
machinery of probability and mathematical statistics is at our disposal both 
to create and to analyze models for them. The following paragraphs discuss 
the five key functions used in describing a random variable. They will be 
illustrated with four ongoing models as identified in the list above plus two 
more to be introduced later. 


2.2 KEY FUNCTIONS AND FOUR MODELS 


Definition 2.1 The cumulative distribution function, also called the 
distribution function and usually denoted Fx(x) or F(z), for a random 
variable X is the probability that X is less than or equal to a given number. 
That is, Fx (x) = Pr(X < x). The abbreviation cdf is often used. 


The distribution function must satisfy a number of requirements? 


o 0 < F(z) <1 for all z. 


2When denoting functions associated with random variables, it is common to identify the 
random variable through a subscript on the function. Here, subscripts will be used only 
when needed to distinguish one random variable from another. In addition, for the six 
models to be introduced shortly, rather than write the distribution function for random 
variable 2 as Fx, (z), it will simply be denoted Fo(z). 

3The first point follows from the last three. 


| 
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o F(z) is nondecreasing. 
o F(x) is right-continuous.4 


© lim, ..9 F(x) = 0 and img 9 F(x) = 1. 
Because it need not be left-continuous, it is possible for the distribution func- 
tion to jump. When it jumps, the value is assigned to the top of the jump. 
Here are possible distribution functions for each of the four models. 


Model 1° This random variable could serve as a model for the age at death. 
All ages between 0 and 100 are possible. While experience suggests that there 

-is an upper bound for human lifetime, models with no upper limit may be 
useful if they assign extremely low probabilities to extreme ages. This allows 
the modeler to avoid setting a specific maximum age. 


0, z<O0, 
Fi(z)=¢ 00lz, 0<2< 100, 
1, x > 100. O 


Model 2 This random variable could serve as a model for the number of 
dollars paid on an automobile insurance claim. All positive values are possible. 
As with mortality, there is more than likely an upper limit (all the money 
in the world comes to mind), but this model illustrates that in modeling, 
correspondence to reality need not be perfect. 


0, xz<0, 
F(z) = 2 Š 
= be (ome) ae o 


Example 2.2 Draw graphs of the distribution function for Models 1 and 2 
(graphs for the other models are requested in Exercise 2.2) 


The graphs appear in Figures 2.1 and 2.2. O 


Model 3 This random variable could serve as a model for the number of 
claims on one policy in one year. Probability is concentrated at the five points 
(0,1,2,3,4) and the probability at each is given by the size of the jump in the 
distribution function. While this model places a maximum on the number of 


claims, models with no limit (such as the Poisson distribution) could also be 
used. 


‘Right-continuous means that at any point zo the limiting value of F(x) as x approaches 
xo from the right is equal to F(xg). This need not be true as x approaches zg from the left. 
5 The six models (four introduced here and two later) will be identified by the numbers 1-6. 
Other examples will use the traditional numbering scheme as used for Definitions, etc. 
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Fig. 2.1 Distribution function for Model 1. 
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Fig. 2.2 Distribution function for Model 2. 


0, z <0, 

0.5, 0<zr<1l, 

0.75, 1<2<2, 

0.87, 2<2<3, 

0.95, 3<2<4, 

1, T>4. o 


F(z) = 


Model 4 This random variable could serve as a model for the total dollars 
paid on a malpractice policy in one year. Most of the probability is P 
(0.7) because in most years nothing is paid. The remaining 0.3 of probability 
is distributed over positive values. 
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0, T< 0, 
F(x) = { T= 0.3¢—0-000012 z>0. o 


Definition 2.3 The support of a random variable is the set of numbers that 
are possible values of the random variable. 


Definition 2.4 A random variable is called discrete if the support contains 
at most a countable number of values. It is called continuous if the distrib- 
ution function is continuous and is differentiable everywhere with the possible 
exception of a countable number of values. It is called mized if it is not dis- 

- crete and is continuous everywhere with the exception of at least one value 
and at most a countable number of values. 


These three definitions do not exhaust all possible random variables but 
will cover all cases encountered in this text. The distribution function for a 
discrete variable will be constant except for jumps at the values with positive 
probability. A mixed distribution will have at least one jump. Requiring 
continuous variables to be differentiable allows the variable to have a density 
function (defined later) at almost all values. 


Example 2.5 For each of the four models, determine the support and indi- 
cate which type of random variable it is. 


The distribution function for Model 1 is continuous and is differentiable 
except at 0 and 100 and therefore is a continuous distribution. The support 
is values from 0 to 100 with it not being clear if 0 or 100 are included. The 
distribution function for Model 2 is continuous and is differentiable except 
at 0 and therefore is a continuous distribution. The support is all positive 
numbers and perhaps 0. The random variable for Model 3 places probability 
only at 0, 1, 2, 3, and 4 (the support) and thus is discrete. The distribution 
function for Model 4 is continuous except at 0, where it jumps. It is a mixed 
distribution with support on nonnegative numbers. O 


These four models illustrate the most commonly encountered forms of the 
distribution function. For the remainder of this text, values of functions like 
the distribution function will be presented only for values in the range of the 
support of the random variable. 


Definition 2.6 The survival function, usually denoted Sx(x) or S(x), for 
a random variable X is the probability that X is greater than a given number. 
That is, Sx (x) = Pr(X > x) = 1 — Fx (£x). 

As a result: 


o 0< S(x) <1 for all z. 
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o S(x) is nonincreasing. 
e S(x) is right-continuous. 
o liMm;—=-o S(x) = 1 and lims—=œ S(T) = 0. 


Because the survival function need not be left-continuous, it is possible for it 
to jump (down). When it jumps, the value is assigned to the bottom of the 
e the survival function is the complement of the distribution func- 
tion, knowledge of one implies knowledge of the other. Historically, when the 
random variable is measuring time, the survival function is presented, while 
when it is measuring dollars, the distribution function is presented. 


Example 2.7 For completeness, here are the survival functions for the four 


models. 
S(x) = 1 — 0.01z, 0 < x < 100, 


3 
2,000 
2 3 > 


0.5, 0<r<1, 
0.25, 1<2<2,, 


S3(z) = ¢ 0.13, 2<2<3 
0.05, 3<2<4,, 
0, z>4 
Sa(z) = 0.3277 2 y > 0. g 


‘Example 2.8 Graph the survival function for Models 1 and 2. 


The graphs appear in Figures 2.3 and 2.4. o 


Either the distribution or survival function can be used to determine prob- 
abilities. Let F(b—) = lim, 7 F(x) and let S(b—) be similarly defined. That 
is, we want the limit as x approaches b from below. We have Pr(a < X < b) = 
F(b) — F(a) = S(a) — S(b) and Pr(X = b) = F(b) — F(b—) = S(b-) — S(b). 
When the distribution function is continuous at x, Pr(X = x) = 0; otherwise 
the probability is the size of the jump. The next two functions are more di- 
rectly related to the probabilities. The first is for continuous distributions, 
the second for discrete distributions. 


Definition 2.9 The probability density function, also called the density 
function, usually denoted fx(x) or f(x), is the derivative of the distribution 
function or, equivalently, the negative of the derivative of the survival function. 
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Fig. 2.4 Survival function for Model 2. 


That is, f(x) = F(x) = —S' (x). The density function is defined only at those 
points where the derivative exists. The abbreviation pdf is often used. 


While the density function does not directly provide probabilities, it does 
provide relevant information. Values of the random variable in regions with 
higher density values are more likely to occur than those in regions with lower 
values. Probabilities for intervals and the distribution and survival functions 
can be recovered by integration. That is, when the density function is defined 
over the relevant interval, Pr(a < X < b) = J? F(x)dz, F(b) = f ’ f(x)dz, 
and S(b) = f° f(«)dz. _ 
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Fig. 2.5 Density function for Model 1. 


Example 2.10 For our models, 


fi(z) =0.01, 0< {z< 100, 
(2,000) 
fle) = F900" 
fs(z) is not defined, 
falz) = 0.000003e—°-°°'*, x > 0. 


z>O0, 


It should be noted that for Model 4 the density function does not completely 
describe the probability distribution. As a mixed distribution, there is also 
discrete probability at 0. im 


Example 2.11 Graph the density function for Models 1 and 2. 
The graphs appear in Figures 2.5 and 2.6. o 
Definition 2.12 The probability function, also called the probability mass 


function, usually denoted px(x) or p(x), describes the probability at a dis- 
tinct point when it is not 0. The formal definition is px(x) = Pr(X = 2). 


For discrete random variables, the distribution and survival functions can 
be recovered as F(x) = } „<s p(y) and S(x) = Vy>2 Ply): 


20 RANDOM VARIABLES 


0 500 1,000 1,500 2,000 2,500 3,000 
x 


Fig. 2.6 Density function for Model 2. 


Example 2.13 For our models, 


pi(x) is not defined, 
p2(z) is not defined, 


0.50, T = 0, 
0.25, C= 1, 
p3(z) =< 0.12, 2=2, 
0.08, x = 8, 
0.05, 2=4, 


pa(0) = 0.7. 


It is again noted that the distribution in Model 4 is mixed, so the above 
describes only the discrete portion of that distribution. There is no easy wa; 
to present probabilities/densities for a mixed distribution. On the other ees 
they tend to be more revealing of the mixed nature of the distribution For 
Model 4 we would present the probability density function as l 


_ J 0.7, =0 
fala) = { 0.000003e~0-000012, g > 9) 


realizing that, technically, it is not a probability density function at all. When 
the density function is assigned a value at a specific point, as opposed to being 
defined on an interval, it is understood to be a discrete probability mass. O 


Definition 2.14 The hazard rate, also known as the force of mortality 
and the failure rate and usually denoted hx(x) or h(x), is the ratio of the 
density and survival functions when the density function is defined. That is 


hx(x) = fx(x)/Sx(z). 
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When called the force of mortality, the hazard rate is often denoted (z), 
and when called the failure rate, it is often denoted A(x). Regardless, it 
may be interpreted as the probability density at x given that the argument 
will be at least z. We also have hx(z) = —S’(z)/S(z) = —dInS(z)/dz. 
The survival function can be recovered from S(b) = e~ Jo *@)4*, Though not 
necessary, this formula implies that the support is on nonnegative numbers. 
In mortality terms, the force of mortality is the annualized probability® that 
a person age x will die in the next instant, expressed as a death rate per year. 
In this text we will always use h(x) to denote the hazard rate, although one 
of the alternative names may be used. ed 


Example 2.15 For our models, 


0.01 


hi(z) = 700i 0< z< 100, 
3 
cal T 7 aaa 
h(x) is not defined, 
ha(z) = 0.00001, «>0. 


Once again, note that for the mixed distribution the hazard rate is only 
defined over part of the random variable’s support. This is different from the 
problem above where both a probability density function and a probability 
function are involved. Where there is a discrete probability mass, the hazard 
rate is not defined. Cl 


- Example 2.16 Graph the hazard rate function for Models 1 and 2. 


The graphs appear in Figures 2.7 and 2.8. im 


The following model illustrates a situation in which there is a point where 
the density and hazard rate functions are not defined. 


Model 5 An alternative to the simple lifetime distribution in Model 1 is given 
below. Note that it is piecewise linear and the derivative at 50 is not defined. 
Therefore, neither the density function nor the hazard rate function is defined 
at 50. Unlike the mixed model of Model 4, there is no discrete probability 
mass at this point. Because the probability of 50 occurring is zero, the density 
or hazard rate at 50 could be arbitrarily defined with no effect on subsequent 


®Note that the force of mortality is not a probability (in particular, it can be greater than 
1) although it does no harm to visualize it as a probability. 
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0 20 40 60 80 100 


Fig. 2.7 Hazard rate function for Model 1. 
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Fig. 2.8 Hazard rate function for Model 2. 


calculations. In this text such values will be arbitrarily defined so that the 
function is right continuous.’ See the solution to Exercise 2.1 for an example. 


Se(z) = { 179012, O<2<50, 
5(7)=4 15-0022, 50 <2 <7. 


‘By arbitrarily defining the value of the density or hazard rate function at such a point, 
it is clear that using either of them to obtain the survival function will work. If there is 
discrete probability at this point (in which case these functions are left undefined), then 
the density and hazard functions are not sufficient to completely describe the probability 
distribution. 


ssrA MENS eee 
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A variety of commonly used continuous distributions are presented in Ap- 
pendix A and many discrete distributions are presented in Appendix B. An 
interesting feature of a random variable is the value that is most likely to 
occur. 


Definition 2.17 The mode of a random variable is the most likely value. For 
a discrete variable it is the value with the largest probability. For a continuous 
variable it is the value for which the density function is largest. If there are 
local maxima, these points are also considered to be modes... 


Example 2.18 Where possible, determine the mode for Models 1-5. 


For Model 1, the density function is constant. All values from 0 to 100 
could be the mode, or equivalently, it could be said that there is no mode. 
For Model 2, the density function is strictly decreasing and so the mode is at 
0. For Model 3, the probability is highest at 0. As a mixed distribution, it 
is not possible to define a mode for Model 4. Model 5 has a density that is 
constant over two intervals, with higher values from 50 to 75. These values 
are all modes. B 


2.2.1 Exercises 


2.1 Determine the distribution, density, and hazard rate functions for Model 
5. 


2.2 Construct graphs of the distribution function for Models 3-5. Also graph 
the density or probability function as appropriate and the hazard rate func- 
tion, where it exists. 


2.3 (*) A random variable X has density function f(x) = 42(1 + 22)-3, 
x > 0. Determine the mode of X. 


2.4 (*) A nonnegative random variable has a hazard rate function of A(x) = 
A+e*, z>0. You are also given S(0.4) = 0.5. Determine the value of A. 


2.5 (*) X has a Pareto distribution with parameters a = 2 and 8 = 10,000. 
Y has a Burr distribution with parameters a = 2, y = 2, and 6 = ,/20,000. 
Let r be the ratio of Pr(X > d) to Pr(Y > d). Determine limg_..9 r. 


Basic distributional 
quantities 


3.1 MOMENTS 


There are a variety of interesting calculations that can be done from the 
models described in the previous chapter. Examples are the average amount 
paid on a claim that is subject to a deductible or policy limit or the average 
remaining lifetime of a person age 40. 


Definition 3.1 The kth raw moment of a random variable is the expected 
(average) value of the kth power of the variable, provided it exists. It is denoted 
by E(X*) or by ui. The first raw moment is called the mean of the random 
variable and is usually denoted by p. 


Note that u is not related to u(x), the force of mortality as mentioned on 
page 21. For random variables that take on only nonnegative values [that 
is, Pr(X > 0) = 1], k may be any real number. When presenting formulas 
for calculating this quantity, a distinction between continuous and discrete 
variables needs to be made. Formulas will be presented for random variables 
that are either everywhere continuous or everywhere discrete. For mixed 
models, evaluate the formula by integrating with respect to its density function 
wherever the random variable is continuous and by summing with respect to 
its probability function wherever the random variable is discrete and adding 


Loss Models: From Data to Decisions, Second Edition. 
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the results. The formula for the kth raw moment is 


ui, = E(X*) 


Il 


I z* f(x)dz if the random variable is continuotis 


> tk p(2;) if the random variable is discrete, (3.1) 
j 


where the sum is to be taken over all z; with positive probability. Finally, 
note that it is possible that the integral or sum will not converge, in which 
case the moment is said not to exist. 


Example 3.2 Determine the first two raw moments for each of the five mod- 
els. 


The subscripts on the random variable X indicate which model is being 
used. 


100 
K(X) = | 2(0.01)dz = 50, 
0 
4 ` p100 
E(X?) = f x° (0.01)dz = 3,333.33, 
0 
°° 3(2,000)3 
E(X. = 2 = 
(X2) | ie 2000147 1,000, 
œ > 3(2,000)8 
E(X2) = | 2 dg = 
(X2) MEET dx = 4,000,000, 


E(X3) = 0(0.5) + 1(0.25) + 2(0.12) + 3(0.08) + 4(0.05) = 0.93, 
E(X2) = 0(0.5) + 1(0.25) + 4(0.12) + 9(0.08) + 16(0.05) = 2.25, 


E(X4) = 0(0.7)+ f (0.000003)e~ °°!" dz = 30,000, 
0 


OO 
E(X?) = 07(0.7)+ i £”(0.000003)e~°-0012 dy = 6,000,000,000, 
0 


50 75 
E(Xs5) I z(0.01)dx + [ x(0.02)dx = 43.75, 
50 


50 75 
E(X) = I z”(0.01)dz + ‘| x” (0.02)da = 2,395.83. 
50 m 


Before proceeding further, an additional model will be introduced. This 
one looks similar to Model 3, but with one key difference. It is discrete, 
but with the added requirement that all of the probabilities must be integral 
multiples of some number. In addition, the model must be related to sample 
data in a particular way. 
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Definition 3.3 The empirical model is a discrete distribution based on a 
sample of size n which assigns probability 1/n to each data point. 


Model 6 Consider a sample of size 8 in which the observed data points were 
3, 5, 6, 6, 6, 7, 7, and 10. The empirical model then has probability function 


0.125, x=3, 
0.125, 2=5, 
pe(z) = 4 0.375, x=, 
0.25, z=7, a 
0.125, z= 10. o 


Alert readers will note that many discrete models with finite support look 
like empirical models. Model 3 could have been the empirical model for a 
sample of size 100 that contained 50 zeros, 25 ones, 12 twos, 8 threes, and 5 
fours. Regardless, we will use the term empirical model only when there is an 
actual sample behind it. The two moments for Model 6 are 


E(X6) = 6.25, E(X2) = 42.5 


using the same approach as in Model 3. It should be noted that the mean 
of this random variable is equal to the sample arithmetic average (also called 
the sample mean). 


Definition 3.4 The kth central moment of a random variable is the ez- 
pected value of the kth power of the deviation of the variable from its mean. 
It is denoted by E[(X — p)*] or by uy. The second central moment is usually 
called the variance and denoted go? and its square root, o, is called the stan- 
dard deviation. The ratio of the standard deviation to the mean is called 
the coefficient of variation. The ratio of the third central moment to the 
cube of the standard deviation, yı = 3/03, is called the skewness. The ratio 
of the fourth central moment to the fourth power of the standard deviation, 
Yə = Ma/o*, is called the kurtosis.' 


The continuous and discrete formulas for calculating central moments are 


E[(X — p)" 


Hk 


Cc 
ip (x — p) f (z)dz if the random variable is continuous 


ll 


S(s; — p)*p(z;) if the random variable is discrete. (3.2) 
kj 


lIt would be more accurate to call these items the “coefficient of skewness” and “coefficient 
of kurtosis” because there are other quantities that also measure asymmetry and flatness. 
The simpler expressions will be used in this text. 


l 
f 
| 
| 
| 
| 


28 BASIC DISTRIBUTIONAL QUANTITIES 


In reality, the integral need be taken only over those x values where f(z) is 
positive. The standard deviation is a measure of how much the probability 
is spread out over the random variable’s possible values. It is measured in 
the same units as the random variable itself. The coefficient of variation 
measures the spread relative to the mean. The skewness is a measure of 
asymmetry. A symmetric distribution has a skewness of zero, while a positive 
skewness indicates that probabilities to the right tend to be assigned to values 
further from the mean than those to the left. The kurtosis measures flatness 
of the distribution relative to a normal distribution (which has a kurtosis of 
3). Kurtosis values above 3 indicate that (keeping the standard deviation 


‘constant), relative to a normal distribution, more probability tends to be at 


points away from the mean than at points near the mean. The coefficients of 
variation, skewness, and kurtosis are all dimensionless. 

There is a link between raw and central moments. The following equation 
indicates the connection between second moments. The development uses the 
continuous version from (3.1) and (3.2), but the result applies to all random 
variables. 


H2 de (z — p)’ f(a)de = pe (2? — 2rp + p°) f(x)dz 


= E(X?) — 2uE(X) + p? = py — p’. (3.3) 


Example 3.5 The density function of the gamma distribution appears to be 
positively skewed. Demonstrate that this is true and illustrate with graphs. 


From Appendix A, the first three raw moments of the gamma distribution 
are aĝ, a(a+1)6”, and a(a+1)(a+2)9°. From (3.3) the variance is a6? and 
from the solution to Exercise 3.1 the third central moment is 2a0*. Therefore, 
the skewness is 2a7!/?. Because a must be positive, the skewness is always 
positive. Also, as œ decreases, the skewness increases. 

Consider the following two gamma distributions. One has parameters a = 
0.5 and 0 = 100 while the other has a = 5 and 0 = 10. These have the same 
mean, but their skewness coefficients are 2.83 and 0.89, respectively. Figure 
3.1 demonstrates the difference. i 


Note that when calculating the standard deviation for Model 6 in Exercise 
3.2 the result is the sample standard deviation using n (as opposed to the 
more commonly used n — 1) in the denominator. Finally, it should be noted 
that when calculating moments it is possible that the integral or sum will not 
exist (as is the case for the third and fourth moments for Model 2). For the 
models we typically encounter, the integrand and summand are nonnegative 
and so failure to exist implies that the required limit that gives the integral 
or sum is infinity. See Example 4.15 for an illustration. 
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0.09 
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0.03 44 
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density 


Fig. 3.1 Densities of f(x) ~gamma(0.5, 100) and g(x) ~ gamma(5, 10). 


Definition 3.6 For a given value of d with Pr(X > d) > 0, the excess loss 
variable is Y = X —d given that X >d. Its expected value, 


ex(d) = e(d) = E(Y) = E(X —d|X > d), 


is called the mean excess loss function. Other names for this expectation 
are mean residual life function and complete expectation of life. When 
the latter terminology is used, the commonly used symbol is €q. 


This variable could also be called a left truncated and shifted variable. 
Tt is left truncated because observations below d are discarded. It is shifted 


‘because d is subtracted from the remaining values. When X is a payment 


variable, the mean excess loss is the expected amount paid given that there 
has been a payment in excess of a deductible of d. When X is the age at 
death, the mean excess loss is the expected remaining time until death given 
that the person is alive at age d. The kth moment of the excess loss variable 
is determined from 


Ja (@— #)"f(a)de 


T 
if th iable is continuous 
I Fd) e varia 


. — d)*ol(x; 
= Lsyoals — 9 "ple;) if the variable is discrete. (3.4) 
1— F(d) 


ex (d) 


Here, e&-(d) is defined only if the integral or sum converges. There is a partic- 
ularly convenient formula for calculating the first moment. The development 
is given below for the continuous version, but the result holds for all ran- 
dom variables. The second line is based on an integration by parts where the 
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antiderivative of f(x) is taken as —S(z). 


Jy (æ — a) f(x)dx 
1— F(d) f 
—(x — d)S(x)|P + [7° S(a)dx 
S(d) 


ex(d) 


fees (3.5) 


Definition 3.7 The left censored and shifted variable is 


0, XxX <d, 
Y=(X-Oe=q og ed. 


It is left censored because values below d are not ignored but are set equal 
to 0. There is no standard name or symbol for the moments of this variable. 
For dollar events, the distinction between the excess loss variable and the left 
censored and shifted variable is one of per payment versus per loss. In the 
former situation, the variable exists only when a payment is made. The latter 
variable takes on the value 0 whenever a loss produces no payment. The 
moments can be calculated from 


los) 
E[(X — d)*] | (a — d)* f(x) dz if the variable is continuous, 


5 (z; — d)*p(x;) if the variable is discrete. (3.6) 


zj>d 
It should be noted that 
ERX — d)4] = e*(d)[1 — F(d)]. (3.7) 


Example 3.8 Construct graphs to illustrate the difference between the excess 
loss variable and the left censored and shifted variable. 


The two graphs in Figures 3.2 and 3.3 plot the modified variable Y as 
a function of the unmodified variable X. The only difference is that for X 
values below 100 the variable is undefined while for the left censored and 
shifted variable it is set equal to zero. O 


The next definition provides a complementary function to the excess loss. 


Definition 3.9 The limited loss variable is 


X, X <u, 


¥=Xau={ u, X >u. 


Its expected value, E|X ^u), is called the limited expected value. 
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Fig. 3.2 Excess loss variable. 


Fig. 3.3 Left censored and shifted variable. 


This variable could also be called the right censored variable. It is right 
censored because values above u are set equal to u. An insurance phenom- 
enon that relates to this variable is the existence of a policy limit that sets 
a maximum on the benefit to be paid. Note that (X — d),+(X Ad) = X. 
That is, buying one policy with a limit of d and another with a deductible of 
d is equivalent to buying full coverage. This is illustrated in Figure 3.4. 
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Fig. 3.4 Limit of 100 plus deductible of 100 equals full coverage. 


The most direct formulas for the kth moment of the limited loss variable 
are 


E(x Au)*] = / zt f(x)dz + u®[1 — F(u)] 
CO 

if the random variable is continuous 

XC złp(z;) + u*[1 — F(u)] 

tjSu 


if the random variable is discrete. (3.8) 


Another interesting formula is derived as follows: 


0 u 
E(X Au)"] i z f (x)dz + f ză" f(x)dx + u*(1 — F(u)] 


-00 


Il 


0 
zë F(z), — f ka*-1 F(a) da 
~OO 


=z" S (x) + f kz"! S(x)dz + u*S(u) 
0 


0 u 
-f ka*-1 F(a)da + f kz"! 8(x)dz, (3.9) 
—co 0 
where the second line uses integration by parts. For k = 1, we have 
0 u 
E(X Au) = -f F(z)dz +f S(x)dz. 
-00 0 


‘The corresponding formula for discrete random variables is not particularly 
interesting. The limited expected value also represents the expected dollar 
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saving per incident when a deductible is imposed. The kth limited moment 
of many common continuous distributions is presented in Appendix A. Ex- 
ercise 3.8 asks you to develop a relationship between the three first moments 
introduced previously. 


3:1.1 Exercises 
3.1 Develop formulas similar to (3.3) for jzg and jig. 


3.2 Calculate the standard deviation, skewness, and kurtosis for each of the 
six models. It may help to note that Model 2 is a Pareto distribution and the 
density function in the continuous part of Model 4 is an exponential distrib- 
ution. Formulas that may help with calculations for these models appear in 
Appendix A. 


: 3.3 (*) A random variable has a mean and a coefficient of variation of 2. The 
third raw moment is 136. Determine the skewness. 


3.4 (*) Determine the skewness of a gamma distribution that has a coefficient 
of variation of 1. 


3.5 Determine the mean excess loss function for Models 1-4. Compare the 
functions for Models 1, 2, and 4. 


3.6 (*) For two random variables, X and Y, ey(30) = ex(30)+4. Let X 
have a uniform distribution on the interval from 0 to 100 and let Y havea 
uniform distribution on the interval from 0 to w. Determine w. 


3.7 (*) A random variable has density function f(x) = \~*e7*/A, x, À > 0. 
Determine e(A), the mean residual life function evaluated at A. 


3.8 Show that the following relationship holds: 
E(X) = e(d)S(d) + E(X Ad). (3.10) 


3.9 Determine the limited expected value function for Models 1-4. Do this 
using both (3.8) and (3.10). For Models 1 and 2 also obtain the function using 
(3.9). 


3.10 (*) Which of the following statements are true? 
(a) The mean residual life function for an empirical distribution is con- 


tinuous. 


(b) The mean residual life function for an exponential distribution is 
constant. 


(c) If it exists, the mean residual life function for a Pareto distribution, 


$ : a iue 
is decreasing. D a a 


LIBRAR 
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3.11 (*) Losses have a Pareto distribution with a = 0.5 and @ = 10,000. 
Determine the mean residual life at 10,000. 


3.12 Define a right truncated variable and provide a formula for its kth mo- 
ment. > 


3.13 (*) The severity distribution of individual claims has pdf 
f(w) = 2.52755, 2 > 1. 
Determine the coefficient of variation. 


3.14 (*) Claim sizes are for 100, 200, 300, 400, or 500. The true probabilities 
for these values are 0.05, 0.20, 0.50, 0.20, and 0.05, respectively. Determine 
the skewness and kurtosis for this distribution. 


3.15 (*) Losses follow a Pareto distribution with a > 1 and @ unspecified. 
Determine the ratio of the mean excess loss function at z = 20 to the mean 
excess loss function at 2 = @. 


3.2 PERCENTILES 


One other value of interest that may be derived from the distribution function 
is the percentile function. It is the inverse of the distribution function, but 
because this quantity is not well defined, an arbitrary definition must be 
created. 


Definition 3.10 The 100pth percentile of a random variable is any value 
Tp such that F(np—) < p < F(ap). The 50th percentile, Tos is called the 
median. 


If the distribution function has a value of p for one and only one z value, 
then the percentile is uniquely defined. In addition, if the distribution function 
jumps from a value below p to a value above p, then the percentile is at the 
location of the jump. The only time the percentile is not uniquely defined 
is when the distribution function is constant at a value of p over a range of 
values. In that case, any value in that range can be used as the percentile. 


Example 3.11 Determine the 50th and 80th percentiles for Models 1 and 3. 


For Model 1, the pth percentile can be obtained from p = F(ap) = 0.017, 
and so Tp = 100p, and in particular, the requested percentiles are 50 and 
80 (see Figure 3.5). For Model 3 the distribution function equals 0.5 for all 
0 < x < 1 and so all such values can be the 50th percentile. For the 80th 
percentile, note that at z = 2 the distribution function jumps from 0.75 to 
0.87 and so 79.3 = 2 (see Figure 3.6). o 
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Fig. 3.5 Percentiles for Model 1. 


F(x) 
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Fig. 3.6 Percentiles for Model 2. 


3.2.1 Exercises 


3.16 (*) The cdf of a random variable is F(z) = 1 — £7?, x > 1. Determine 
the mean, median, and mode of this random variable. 


3.17 Determine the 50th and 80th percentiles for Models 2, 4, 5, and 6. 


3.18 (*) Losses have a Pareto distribution with parameters a and 6. The 
10th percentile is @— k. The 90th percentile is 50 — 3k. Determine the value 
of a. 


3.19 (*) Losses have a Weibull distribution with parameters 7 and 0. The 
25th percentile is 1,000 and the 75th percentile is 100,000. Determine the 
value of T. 


| 
t 
i 


36 BASIC DISTRIBUTIONAL QUANTITIES 


3.3 GENERATING FUNCTIONS AND SUMS OF RANDOM 
VARIABLES 


An insurance company rarely insures only one person. The total claims paid 
on all policies is the sum of all payments. Thus it is useful to be able to 
determine properties of Sk = Xı +---+ Xx. The first result is a version of 
the central limit theorem. 


Theorem 3.12 For a random variable Sy as defined above, E(.S;,) = E(X1)+ 
---+E(X}). Also, if X1,...,Xm% are independent, Var(S,) =Var( X1) +-+- 

--4+-Var(X;). If the random variables X1, X2,... are independent and their first 
two moments meet certain conditions, limp—+co[Sp—E(Sx)]/./ Var(Sk) has a 
normal distribution with mean 0 and variance 1. 


Obtaining the distribution or density function of 5; is usually very difficult. 
However, there are a few cases where it is simple. The key to this simplicity 
is the generating function. 


Definition 3.13 For a random variable X, the moment generating func- 
tion (mgf) is Mx(t) = E(e’*) for all t for which the expected value ezists. 
The probability generating function (pgf) is Px(z) = E(z*) for all z for 
which the expectation exists. 


Note that Mx(t) = Px(e*) and Px(z) = Mx(Inz). Often the mgf is used 
for continuous random variables and the pef for discrete random variables. For 
us, the value of these functions is not so much that they generate moments or 
probabilities but that there is a one-to-one correspondence between a random 
variable’s distribution function and its mgf and pef (i.e., two random variables 
with different distribution functions cannot have the same mgf or pgf). The 
following result aids in working with sums of random variables. 


Theorem 3.14 Let Sp = X,-+---+ Xp, where the random variables in the 
sum are independent. Then Ms, (t) = Ths Mx, (t) and Ps,,(z) = Wa Px, (z) 
provided all the component mgfs and pgfs exist. 


Proof: We use the fact that the expected product of independent random 
variables is the product of the individual expectations. Then, 


Ms, (t) sas E(e’*) = Blett-+X4)] 


k k 
= [[E(+*) = [[%, ©. 
j=l j=l 
A similar argument can be used for the pgf. Oo 


Example 3.15 Show that the sum of independent gamma random variables, 
each with same value of 6, has a gamma distribution. 
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The moment generating function of a gamma variable is 


i eTe le t de 
T(a)0* 
he gzæ-le-z(—t+1/0) dz 
T(a)6 
So. Y(t + 1/0) tedy 
[(a)6 


T(a)(-t+1/0)~* _ a toes = 
Ta = (sa Zz z) , t< 1/0. 


Il 


E(e’*) 


Now let X; have a gamma distribution with parameters a, and 6. Then the 
moment generating function of the sum is 


k 1 \o% 1 \orbetak 
m= TI (qm) -= (72a) 


j=1 
which is the moment generating function of a gamma distribution with para- 
meters a; +--+ @p and @. o 


Example 3.16 Obtain the mgf and pgf for the Poisson distribution. 


The pgf is 


[oe] Btu A oo T 
Ae _ zr os J= 
Px(z) = > gS =e > ay a) = ee = eD, 
! 44 al! 


x=0 


Then the mgf is Mx (t) = Px(e’) = exp[A(e’ — 1)]. o 


3.3.1 Exercises 


3.20 (*) A portfolio contains 16 independent risks, each with a gamma dis- 
tribution with parameters a = 1 and 0 = 250. Give an expression using 
the incomplete gamma function for the probability that the sum of the losses 
exceeds 6,000. Then approximate this probability using the central limit the- 
orem. 


3.21 (*) The severities of individual claims have the Pareto distribution with 
parameters a = 8/3, and 6 = 8,000. Use the central limit theorem to ap- 
proximate the probability that the sum of 100 independent claims will exceed 
600,000. 


3.22 (*) The severities of individual claims have the gamma distribution (see 
Appendix A) with parameters a = 5 and 6 = 1,000. Use the central limit 
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theorem to approximate the probability that the sum of 100 independent 
claims exceeds 525,000. 


3.23 A sample of 1,000 health insurance contracts on adults produced a sam- 
ple mean of 1,300 for the annual benefits paid with a standard deviation of 
400. It is expected that 2,500 contracts will be issued next year. Use the 
central limit theorem to estimate the probability that benefit payments will 
be more than 101% of the expected amount. 


3.24 Show that the mgf for the inverse Gaussian distribution with ô replaced 
_ by B = p?/0 is given by 


M(t) = exp E (1 = VTT) , t < 1/(28). 


Classifying and creating 
distributions 


4.1 INTRODUCTION 


The set of all possible distribution functions is too large to comprehend. 
Therefore, when searching for a distribution function to use as a model for 
a random phenomenon, it can be helpful if the field can be narrowed. One 
division that has already been discussed is the separation into discrete, contin- 
uous, and mixed distributions. In most situations it will be obvious which of 
the three applies. Beyond this, we need more artificial distinctions. The next 
section describes a split based on the complexity of the model. The following 
section then looks at the shape of the distribution to distinguish one from 
another. After that, a few methods of creating additional distributions are 
introduced. This is followed by a listing of some commonly used continuous 
distributions. The final section is an extensive treatment of discrete distrib- 
utions. By the end of this chapter, most of the distributions in Appendices 
A and B will have been introduced. While this chapter is more about differ- 
ences from one distribution to another, these distributions have some common 
elements that are desirable for actuarial models. Among them are: 


o The support is a subset of the nonnegative real numbers. Most actuarial 
phenomena are counts or are measurements of time or money and as 
such are rarely negative, although if the random variable of interest 


Loss Models: From Data to Decisions, Second Edition. 


By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 
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is financial gain, negative outcomes are certainly possible. However, 
financial gain is often the result of the realization of several nonnegative 
variables, some measuring income and some measuring expenses. 


e Some distributions are special cases of others. This allows the modeler 
to choose from models of varying complexity. 


e They can have one or more modes and a mode may or may not be at 
zero. 


4.2 THE ROLE OF PARAMETERS 


This split has to do with how much information is needed to specify the model. 
The number of quantities (parameters) needed to do so gives some indication 
of how complex a model is, in the sense that many items are needed to describe 
it. Arguments for a simple model include the following: 


e With few items required in its specification, it is more likely that each 
item can be determined more accurately. 


o It is more likely to be stable across time and across settings. That is, 
if the model does well today, it (perhaps with small changes to reflect 
inflation or similar phenomena) will probably do well tomorrow and will 
also do well in other, similar, situations. 


ə Because data can often be irregular, a simple model may provide neces- 
sary smoothing. 


Of course, complex models also have advantages. 


o With many items required in its specification, it can more closely match 
reality. 


e With many items required in its specification, it can more closely match 
irregularities in the data. 


Another way to express the difference is that simpler models can be es- 
timated more accurately, but the model itself may be too superficial. The 
principle of parsimony states that the simplest model that adequately reflects 
reality should be used. The definition of “adequately” will depend on the 
purpose for which the model is to be used. 

In the following subsections, we will move from simpler models to more 
complex models. There is some difficulty in naming the various classifica- 
tions because there is not universal agreement on the definitions. With the 
exception of parametric distributions, the other category names have been 
created by the authors. It should also be understood that these categories 
do not cover the universe of possible models nor will every model be easy to 
categorize. These should be considered as qualitative descriptions. 
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4.2.1 Parametric and scale distributions 


These models are simple enough to be specified by a few key numbers. 


Definition 4.1 A parametric distribution is a set of distribution func- 
tions, each member of which is determined by specifying one or more values 
called parameters. The number of parameters is fixed and finite. 


The most familiar parametric distribution is the normal distribution with 
parameters u and o?. When values for these two parameters are specified, 
the distribution function is completely known. Aa 

These are the simplest distributions in this subsection, because typically 
only a small number of values need to be specified. All of the individual 
distributions in Appendices A and B are parametric. Within this class, distri- 
butions with fewer parameters are simpler than those with more parameters. 

For much of actuarial modeling work, it is especially convenient if the name 
of the distribution is unchanged when the random variable is multiplied by a 
constant. The most common uses for this phenomenon are to model the effect 
of inflation and to accommodate a change in the monetary unit. 


Definition 4.2 A parametric distribution is a scale distribution if, when 
a random variable from that set of distributions is multiplied by a positive 
constant, the resulting random variable is also in that set of distributions. 


Example 4.3 Demonstrate that the exponential distribution is a scale distri- 
bution. 


According to Appendix A, the distribution function is Fx(x) = 1 — e7%/®. 
Let Y = cX, where c > 0. Then, ; 


Fy(y) = Pr(Y < y), 
= Pr(cX <y), 


= r(x <8), 


= 1-e W/o? 


But this is an exponential distribution with parameter cô. o 


Definition 4.4 For random variables with nonnegative support, a scale pa- 
rameter is a parameter for a scale distribution that meets two conditions. 
First, when a member of the scale distribution is multiplied by a positive con- 
stant, the scale parameter is multiplied by the same constant. Second, when a 
member of the scale distribution is multiplied by a positive constant, all other 
parameters are unchanged. 


l 
l 
| 
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Example 4.5 Demonstrate that the gamma distribution, as defined in Ap- 
pendiz A, has a scale parameter. 


Let X have the gamma distribution and Y = cX. Then, using the incom- 
plete gamma notation in Appendix A, f: 


Fy(y) = Pr (x<!) 


= r(=4) 


` indicating that Y has a gamma distribution with parameters œ and c0. There- 


fore, the parameter @ is a scale parameter. Oo 


Many textbooks write the density function for the gamma distribution as 


a-l e-z Bo 


T 
fe Tay. 


We have chosen to use the version of the density function that has a scale 
parameter. When the alternative version is multiplied by c, the parameters 
become a and @/c. As well, the mean is proportional to ĝ in our version, 
while it is proportional to 1/6 in the alternative version. Our version makes 
it easier to get ballpark estimates of this parameter, although, for the alter- 
native definition, one need only keep in mind that the parameter is inversely 
proportional to the mean. Te 

It is often possible to recognize a scale parameter from looking at the 
distribution or density function. In particular, the distribution function would 
have z always appear as x/0. 


4.2.2 Parametric distribution families 


A slightly more complex version of a parametric distribution is one in which 
the number of parameters is finite but not fixed in advance. 


Definition 4.6 A parametric distribution family is a set of parametric 
distributions that are related in some meaningful way. 


The most common type of parametric distribution family is described in 
the following example. 


Example 4.7 One type of parametric distribution family is based on a spec- 
ified parametric distribution. Other members of the family are obtained by 
setting one or more parameters from the specified distribution equal to a pre- 
set value or to each other. Demonstrate that the transformed beta family as 
defined in Appendix A is a parametric distribution family. 
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The transformed beta distribution has four parameters. Each of the other 
named distributions in the family is a transformed beta distribution with 
certain parameters set equal to 1 (for example, the Pareto distribution has 
y =T = 1) or to each other (the paralogistic distribution has r = 1 and 
y = a). Note that the number of parameters (ranging from two to four) is 
not known in advance. There is a subtle difference in definitions. A modeler 
who uses the transformed beta distribution looks at all four parameters over 
their range of possible values. A modeler who uses the transformed beta 
family pays particular attention to the possibility of using special cases such 
as the Burr distribution. For example, if the former modeler collects some 
data and decides that r = 1.01, that will be the value to use. The latter 
modeler will note that 7 = 1 gives a Burr distribution and will likely use that 
model instead. 0 


4.2.3 Finite mixture distributions 


By themselves, mixture distributions are no more complex, but later in this 
subsection we will find a way to increase the complexity level. One motiva- 
tion for mixing is that the underlying phenomenon may actually be several 
phenomena that occur with unknown probabilities. For example, a randomly 
selected dental claim may be from a check-up, from a filling, from a repair 
(such as a crown), or from a surgical procedure. Because of the differing 
modes for these possibilities, a mixture model may work well. 


Definition 4.8 A random variable Y is a k-point mixture! of the random 
variables X1, X2,..., Xg if its cdf is given by 


Fy(y) = ai Fx, (y) + a2Fxa (y) + +++ + an Fx, (Y), (4.1) 
where alla; > 0 and ay +a2 +- Hap =l. 


This essentially assigns probability a; to the outcome that Y is a realization 
of the random variable X;. Note that, if we have 20 choices for a given random 
variable, a two-point mixture allows us to create over 200 new distributions.? 
This may be sufficient for most modeling situations. Nevertheless, these are 
still parametric distribution, though perhaps with many parameters. 


Example 4.9 For models involving general liability insurance, actuaries at 
the Insurance Services Office has had some success with a misture of two 


‘The words “mixed” and “mixture” have been used interchangeably to refer to the type 
of distribution described here as well as distributions that are partly discrete and partly 
continuous. This text will not attempt to resolve that confusion. The context will make 
clear which type of distribution is being considered. 

?There are actually (2) + 20 = 210 choices. The extra 20 represent the cases where both 
distributions are of the same type but with different parameters. 
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Pareto distributions. They also found that five parameters were not necessary. 
The distribution they selected has cdf 


F(z) =1-a G=) - (1 —a) (G25) 


Note that the shape parameters in the two Pareto distributions differ by 2. 
The second distribution places more probability on smaller values. This might 
be a model for frequent, small claims while the first distribution covers large, 
but infrequent claims. This distribution has only four parameters, bringing 
some parsimony to the modeling process. O 


Suppose we do not know how many distributions should be in the mix- 
ture. Then the value of k becomes a parameter, as indicated in the following 
definition. 


Definition 4.10 A variable-component mizture distribution has a dis- 
tribution function that can be written as 


K K 
Pez Sako Xaj =1, aj >0, j=1,..., K, K =1,2,.... 
j=l j=l 


These models have been called semiparametric because in complexity they 
are between parametric models and nonparametric models (see the next sub- 
section). This distinction becomes more important when model selection is 
discussed in Chapter 13. When the number of parameters is to be estimated 
from data, hypothesis tests to determine the appropriate number of para- 
meters become more difficult. When all of the components have the same 
parametric distribution (but different parameters), the resulting distribution 
is called a “variable mixture of gs” distribution, where g stands for the name 
of the component distribution. 


Example 4.11 Determine the distribution, density, and hazard rate func- 
tions for the variable mixture of exponentials distribution. 


A combination of exponential distribution functions can be written 


F(z) = 1—aje7*/91 — age? —... — aye 2/ OK, 


K 
Yea; = l, 0705.0, j =1,..., K, K =1,2,.... 
ramet 


and then the other functions are 


f(z) — a0; e77 + 205 be */9 eee abe 9K , 
a0; 1e7~ 2/9 + a283 'e7~*/9 2 eee arl elx 


h(x) aje—2/%1 + age—2/62 +--+ age? ik 
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Fig. 4.1 Two-point mixture of gammas distribution. 


The number of parameters is not fixed nor is it even limited. For example 
when K = 2 there are three parameters (a1, 01,02), noting that a» is not A 


parameter because once a, is set the value of ag is determined. However 
when K = 4 there are seven parameters. oO 


Example 4.12 Illustrate how a two-poi 7 ) 
point misture of gamm 
create a bimodal distribution. a P 
Consider a fifty-fifty mixture of two gamma distributions. One has para- 
meters a = 4 and @ = 7 (for a mode of 21) and the other has parameters 
a= 15 and 0 =7 (for a mode of 98). The density function is 
ge-2/T7 gite-2/7 


F(z) = 0.5 a Oar 


and a graph appears in Figure 4.1. 0 


X 4.2.4 Data-dependent distributions 


Models 1-5 and many of the examples rely on an associated phenomenon (the 
random variable) but not on observations of that phenomenon. For example 
without having observed any dental claims, we could postulate a lognormal 
distribution with parameters #=5ando= 1. Our model may be a poor 
description of dental claims, but that is a different matter. On the other hand 
it is possible to construct models that require data. These models also have 
parameters but are often called nonparametric. 


Definition 4.13 A data-dependent distribution is at least as complex as 
the data or knowledge that produced it, and the number of “parameters” in- 
creases as the number of data points or amount of knowledge increases. 
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Essentially, these models have as many (or more) “parameters” than ob- 
servations in the data set. The empirical distribution as illustrated by Model 
6 on page 27 is a data-dependent distribution. Each data point contributes 
probability 1/n to the probability function, so the n parameters are the n 
observations in the data set that produced the empirical distribution. 

Another example of a data-dependent model is the kernel smoothing model, 
which is covered in more detail in Section 11.3. Rather than place a spike of 
probability 1/n at each data point, a continuous density function with area 
1/n replaces the data point. This piece is centered at the data point so that 
‘this model follows the data, but not perfectly. It provides some smoothing 
versus the empirical distribution. A simple example is given below. 


Example 4.14 Construct a kernel smoothing model from Model 6 using the 
uniform kernel and a bandwidth of 2. 


The probability density function is 


5 
f(z) = > relz) K), 
{ 0, |z — z;| > 2, 


K;(z) 0.25, |e—a,| <2, 


where the sum is taken over the five points where the original model has 
positive probability. For example, the first term of the sum is the function 


0, z<il, 
pe(t1)Ki(x) = 0.03125, 1<2#<5, 
z> 6. 


? 


The complete density function is the sum of five such functions, which are 
illustrated in Figure 4.2. 0 


Note that both the kernel smoothing model and the empirical distribution 
can also be written as mixture distributions. The reason these models are 
classified separately is that the number of components relates to the sample 
size rather than to the phenomenon and its random variable. 


4.2.5 Exercises 


4.1 Demonstrate that the lognormal distribution as parameterized in Appen- 
dix A is a scale distribution but has no scale parameter. Display an alternative 
parametrization of this distribution that does have a scale parameter. 


4.2 Which of Models 1-6 could be considered as members of a parametric 
distributicn? For those that are, name or describe the distribution. 
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0.5 15 25 35 45 55 65 7.5 85 9.5 10.5115 a 
x 


Fig. 4.2 Kernel density distribution. 


4.3 (*) Claims have a Pareto distribution with a = 2 and 9 unknown. Claims 
the following year experience 6% uniform inflation. Let r be the suid of the 
proportion of claims that will exceed d next year to the proportion of clai 

that exceed d this year. Determine the limit of r as d goes to infinity. = 


ee Determine the mean and second moment of the two-point mixture distri- 
ution in Example 4.9. The solution to this exercise provides general formulas 
for raw moments of a mixture distribution. 


4.5 Determine expressions for the m i mixt 
T ean and variance ixtur 
mas distribution. Pa eee 


4.6 Which of Models 1-6 could be considered to be from parametric distri- 


bution families? Which could be consi 
? nsidered to be fr ri 
E om variable-component 


4.7 (*) Seventy-five percent of claims have a normal distribution with a mean 
of 3,000 and a variance of 1,000,000. The remaining 25% have a re 
distribution with a mean of 4,000 and a variance of 1,000,000. Determine th 

probability that a randomly selected claim exceeds 5,000. ° 


4.8 (*) Let X has a Burr distribution wi 

. th parameters a = 1, y = 2, and 
0 Pe and let Y has a Pareto distribution with BE a = 1 
and 0 = 1,000. Let Z be a mixture of X and Y with equal weight on each 
component. Determine the median of Z. Let W = 1.1Z. Demonstrate that 


is also a mixture of a Burr and 
W istribution and determine the 


* . 
se (*) Consider three random variables: X is a mixture of a uniform distri- 
ution on the interval 0 to 2 and a uniform distribution on the interval 0 to 3; 


$ 
i 
| 
| 


-4.10 Demonstrate that the model in Example 4.14 is a mixture of uniform 
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Y is the sum of two random variables, one is uniform on 0 to 2 and the other 
is uniform on 0 to 3; Z is a normal distribution that has been right censored 
at 1. Match these random variables with the following descriptions: 


(a) Both the distribution and density functions are continuous. 


(b) The distribution function is continuous but the density function is 
discontinuous. 


(c) The distribution function is discontinuous. 


distributions. 


4.11 Show that the inverse Gaussian distribution as parameterized in Ap- 
pendix A is a scale family but does not have a scale parameter. 


4.12 Show that the Weibull distribution has a scale parameter. 


4.3 TAIL WEIGHT 


The tail of a distribution (more properly, the right tail) is that part that re- 
veals probabilities about large values. It is of interest to actuaries because it is 
the occurrence (or lack thereof) of large values that is most influential on prof 
its. Risky types of insurance such as medical malpractice feature more large 
claims (relative to the mean) than less risky insurances such as automobile 
physical damage. Random variables that tend to assign higher probabilities to 
large values are said to be heavy tailed. Tail weight can be a relative concept 
(model A has a heavier tail than model B) or an absolute concept (distribu- 
tions with a certain property are classified as heavy tailed). When choosing 
models, tail weight can help narrow the choices or can confirm someone else’s 
choice. For example, when someone models medical malpractice payments 
with a Pareto distribution, it seems reasonable as the Pareto distribution is 
regarded as having a heavy tail. Conversely, the light-tailed lognormal distri- 
bution may be a reasonable model for dental insurance payments. However, 
it should be noted that various measures of tail weight need not agree. 


4.3.1 Existence of moments 


Recall that in the continuous case the kth raw moment for a random variable 
that takes on only positive values (like most insurance payment variables) is 
given by h zk f (x)dz. Depending on the density function and the value of 
k, this integral may not exist. If the density function is too large for large 
values of z, then, when multiplied by the large number z*, the values will be 
too large for the integral to converge. Thus, existence of all positive moments 
indicates a light right tail, while existence of only positive moments up to a 
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alin value or e te e 0 po e moments 
) 


Example 4.15 Demonstrate that for the gamma distribution all positive raw 
moments exist but for the Pareto distribution they do not. 


- For the gamma distribution 


co a—lp—T/0 
A = f bt er 
k ; T O dx 


Be 

= p (y0) te l . es 

J [ (y9) Taye? Ody, making the substitution y = 2/0 
gE 


= a T all k > 0. 


while for the Pareto distribution 


I j k a0” 
Ek = aA 
k [ T CETE dz 
m i p 20% i 
z (y — 4) yati dy, making the substitution y = x + 


oo k 
ate a k flees eg 
= a6 f 5 OE 1(_9)*-Jdy for integer values of k. 
‘art 


The integral exists only if all of the exponents on y in the sum are less than 


—1. That is, 7 -a—1< —1 for all j i 
j, or, equivalently, k < a. 
some moments exist. 7 ” T = 


By this reckoning, the Pareto distribution is said to have a heavier tail 
than the gamma distribution. A look at the moment formulas in Appendix 


3 . 
ka a ede manner, existence of all negative moments indicates a light left tail. A feel 
or the left tail may aid in choosing an appropriate distributional model. The followi 
ne apply to density functions that are monotonic and differentiable near zero ar 
ees #0) and the slope of the density function near zero are related to the 
SA a Vom alae ar ae is Fara Suppose that negative 
; —r. , f(x) goes to infinity as x — 0. If r = i 
x cage i number. If 1 <r < 2, f(0) = 0 and the slope goes to infinity as e a if 
on ; ) = 0 and the slope at 0 is a nonnegative number. If r > 2, f(0) = 0 and the 
slope at 0 is 0, so only a small portion of the distribution is near zero l 
As an example, the Weibull distribution with r = 0.2 has been used for workers com- 
aren Bee The value | A 0.2, which means that a lot of probability is near 
; oduce a mean o 30,000 gives a 28% chance of a claim being les 
B TN = it exceeding 500,000. This may be a reasonable model for ee 
ee ‘a ae or a way to use part of a model). In contrast, the lognormal distribution 
nee 4 e mean and variance has less than 0.1% probability of being less than 1 
also a out 1% probability of exceeding 500,000. The lognormal distribution h 1 
negative moments and thus has f(x) go to zero as x goes to zero. wae 
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A reveals which distributions have heavy tails and which do not, as indicated 
by the existence of moments. 


4.3.2 Limiting ratios r 


An indication that one distribution has a heavier tail than another is that 
the ratio of the two survival functions should diverge to infinity (with the 
heavier tailed distribution in the numerator). This implies that the numerator 
distribution puts significantly more probability on large values. It is equivalent 
_to examine the ratio of density functions. The limit will be the same, as can 
be seen by an application of L’Hépital’s rule: 

tin S12) = thm SO — tim A) 


Example 4.16 Demonstrate that the Pareto distribution has a heavier tail 
than the gamma distribution using the limit of the ratio of their density func- 
tions. 


To avoid confusion, the letters 7 and À will be used for the parameters of 
the gamma distribution instead of the customary a and 8. Then the required 
limit is 


: feareto (x) En, Tri ab“ (x + (eee 
EOS feamma (£) T—00 gt—le-2/A\~7T (T) 7} 
: ew/> 
= ¢ jm Gone 
$ orl 


lim ——— 
Ca (x + g)yote 


and, either by application of L’Hopital’s rule or by remembering that expo- 
nentials go to infinity faster than polynomials, the limit is infinity. Figure 4.3 
shows a portion of the density functions for a Pareto distribution with para- 
meters a = 3 and @ = 10 and a gamma distribution with parameters a = 3 
and 8 = 15. Both distributions have a mean of 5 and a variance of 75. The 
graph is consistent with the algebraic derivation. O 


4.3.3 Hazard rate and mean residual life patterns 


The nature of the hazard rate function also reveals information about the tail 
of the distribution. If the hazard rate function is decreasing, then at large 
values the chance of that value becomes small and the chance of larger values 
becomes greater. Thus the distribution will have a heavier tail. Conversely, 
if the hazard rate function is increasing, a lighter tail is expected. 
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Fig. 4.3 Tails of gamma and Pareto distributions. 


Example 4.17 Compare the tails of the Pareto and gamma distributions by 
looking at their hazard rate functions. 


The hazard rate function for the Pareto distribution is 


nay — £2) _ a 82 __a 


S(x) = (z+  2+0 
which is decreasing. For the gamma distribution we need to be a bit more 
clever because there is no closed form expression for S(x). Observe that 
1 _ Se TOS _ fo" fle +u)dy 


hæ) f(z) f(z) 


‘and so, if f(x + y)/f(z) is an increasing function of x for any fixed y, then 


1/h(a) will be increasing in x and so the random variable will have a decreasing 
hazard rate. Now, for the gamma distribution 
f(e+y) _ (ery) te Ot (+2) eTEN, 

z 


f(z) a ga—le~z/8 7 

which is strictly increasing in x provided a < 1 and strictly decreasing in 
z ifa> 1. By this measure, some gamma distributions have a heavy tail 
(those with a < 1) and some have a light tail. Note that when a = 1 we have 
the exponential distribution and a constant hazard rate. Also, even though 
h(x) is complicated in the gamma case, we know what happens for large x. 
Because f(x) and S(x) both go to 0 as z — oo, L’Hépital’s rule yields 


lim A(z) EAC E im LE — im | Fn rE) 


Il 


ll 
| 
a 
E 
a 
R 
| 
© 
E 
R 
| 
S18 
a } 
Il 
a 
i} rs 
8 
fo EN 
Dj = 
| 
R 
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iy 
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That is, h(x) — 1/8 as x — oo. o 


The mean residual life also gives information about tail weight. If the 
mean residual life function is increasing in d, then at large values the expected 
outcome is much larger and thus probability is moved to the right, indicating 
a heavier tail than a model where the mean residual life function is decreasing 
or is increasing at a slower rate. In fact, the mean residual life function and 
the hazard rate are closely related in several ways. First, note that 


S(d) exp |- J h(z)da| 


exp |- l h(d + ta : 


Therefore, if the hazard rate is deċreasing, then for fixed y it follows that 
Jo h(d + t)dt is a decreasing function of d, and from the above S(y + d)/S(d) 
is an increasing function of d. But from (3.5), the mean residual life function 
may be expressed as 


e JE S(2)de _ y Sly +d) gy, 
0 


s(a) S(a) 


Sy+d) _ exp[- jtoe] = l- f j Noy 
d 


Thus, if the hazard rate is a decreasing function, then the mean residual 
life function e(d) is an increasing function of d because the same is true of 
S(y+d)/S(d) for fixed y. Similarly, if the hazard rate is an increasing function, 
then the mean residual life function is a decreasing function. It is worth noting 
(and is perhaps counterintuitive), however, that the converse implication is not 
true. Exercise 4.16 gives an example of a distribution which has a decreasing 
mean residual life function, but the hazard rate is not increasing for all values. 
Nevertheless, the implications described above are generally consistent with 
the above discussions of heaviness of the tail. 

There is a second relationship between the mean residual life function and 
the hazard rate. As d — oo, S(d) and f re S(x)dz go to 0. Thus, the limiting 
behavior of the mean residual life function as d — co may be ascertained via 
L’H6pital’s rule because (3.5) holds. We have 


f n Ja S(@)dz 
pa e(d) = po i S(d) ~~ dco —f(d) ~ dco h(d) 


-S(d) i: A 


as long as the indicated limits exist. These limiting relationships are useful if 
S(x) [and hence also A(x) and e(d)| is complicated. 


Example 4.18 Examine the behavior of the mean residual life function of 
the gamma distribution. 
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Because e(d) = f7 S(x)dx/S(d) and S(2) is complicated, e(d) is compli- 
cated. But e(0) = E(X) = a@ from Appendix A, and, using Example 4.17, 
we have 

ea ee ee 

ghee. soo h(s) lim A(z) 
Also, from Example 4.17, h(x) is strictly decreasing in x for a < 1 and 
strictly increasing in x for a > 1, implying that e(d) is strictly increasing 
from e(0) = a0 to e(co) = 8 for a < 1 and strictly decreasing from e(0) = ad 
to e(oo) = 8 for a > 1. For a = 1, we have the exponential distribution for 
which e(d) = 8. Oo 


Further insight into the mean residual lifetime and the heaviness of the tail 
may be obtained by introducing the so-called equilibrium distribution, which 
has important applications in connection with the continuous time ruin model 
of Chapter 8. For positive random variables with S(0) = 1, it follows from 
Definition 3.6 and (3.5) with d = 0 that E(X) = Jy S(a)dz, or equivalently 
1= fp [S(#)/E(*)]dz, so that 

S(z) 


fe(z) a E(X)’ xz > 0, (4.2) 


is a probability density function. The corresponding survival function is 


Selz) = F fe(t)dt = a x>0. 


The hazard rate corresponding to the equilibrium distribution is 


_ fe(z) __S(2) 
he(@) = 'g (a) ~ J S(t 


1 
e(z) 


using (3.5). Thus, the reciprocal of the mean residual life function is itself a 
hazard rate, and this fact may be used to show that the mean residual life 
function uniquely characterizes the original distribution. We have 


fe(z) = he(x)S-(x) = he(z)e” Jë he(t}de 


or equivalently 


S(x) = ae So (ats fat 


using e(0) = E(X). 
The equilibrium distribution also provides further insight into the rela- 
tionship between the hazard rate, the mean residual life function, and heav- 


‘ness of the tail. Assuming that $(0) = 1, and thus e(0) = E(X), we have 
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fe? S(t)dt = e(0)S-(x), and from (3.5), fe? S(t)dt = e(x)S(a). Equating 
these two expressions results in 

elz) _ Selz) 

e(0) S(x) 


If the mean residual life function is increasing (implied if the hazard rate is 
decreasing), then e(x) > e(0), which is obviously equivalent to S els) 2 > S(x) 
from the above equality. This in turn implies that 


i Selz)dz > a S(x)dz 


But E(X) = fy S(x)dx from Definition 3.6 and (3.5) if $(0) = 1. Also, 
h S.(a)dz = Fa z on since both sides represent the mean of the E 
librium distribution. This may be evaluated using (3.9) with u = o0, k = 2, 
and F(0) = 0 to give the equilibrium mean, that is, 


oo B o0 o1 g i 2 = E?) 
I Se(a)de = | felode = are | xS(a)d JET 


The inequality may thus be expressed as 


E(X?) 
(X) 


> E(X), 


or using Var(X) = E(X?) — {E(X)}? as Var(X) > {E(X)}. That is, the 
squared coefficient of variation, and hence the coefficient of variation itself, is 
at least 1 if e(z) > e(0). Reversing the inequalities implies that the coefficient 
of variation is at most 1 if e(z) < e(0), in turn implied if the mean residual 
life function is decreasing or the hazard rate is increasing. These values of the 
coefficient of variation are consistent with the comments made here about the 
heaviness of the tail. 


4.3.4 Exercises 


4.13 Using the methods in this section (except for the mean residual life), 
compare the tail weight of the Weibull and inverse Weibull distributions. 


4.14 Arguments as in Example 4.16 place the lognormal distribution be- 
tween the gamma and Pareto distributions with regard to tail weight. To 
reinforce this conclusion, consider a gamma distribution with parameters 
a = 0.2, 0 = 500; a lognormal distribution with parameters u = 3.709290, 
g = 1.338566; and a Pareto distribution with parameters a = 2.5, 0 = 150. 
First demonstrate that all three distributions have the same mean and vari- 
ance. Then numerically demonstrate that there is a value such that the gamma 
pdf is smaller than the lognormal and Pareto pdfs for all arguments above that 
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value and that there is another value such that the lognormal pdf is smaller 
than the Pareto pdf for all arguments above that value. 


4.15 Let Y be a random variable that has the equilibrium density from (4.2). 
That is, fy(y) = fe(y) = Sx(y)/E(X) for some random variable X. Use 
integration by parts to show that 

Mx(t)-1 


My (t) = E(X) 


whenever Mx (t) exists. 


4.16 You are given that the random variable X has probability density func- 
tion f(x) = (1 + 227)e~*, x > 0. 

(a) Determine the survival function S(z). 

(b) Determine the hazard rate h(z). 


(c) Determine the survival function Se(x) of the equilibrium distribu- 
tion. 


(d) Determine the mean residual life function e(z). 

(e) Determine limg_... h(x) and limz.o e(z). 

(£) Prove that e(x) is strictly decreasing but h(x) is not strictly in- 
creasing. 


4.17 Assume that X has probability density function f(x), x > 0. 


(a) Prove that 
Je (y= 2) F(w)dy 


Selz) = E(X) 


(b) Use (a) to show that 
S viou = 25(@) + B00 5.(0), 


(c) Prove that (b) may be rewritten as 


Se uf (y)dy 


S(x) = ee) 


and that this in turn implies that 


E(X) 


il 
j 
{ 
i 
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(d) Use (c) to prove that, if e(x) > e(0), then 


E(X) 
S(z) < 2+ E(X) 


and that this in turn implies that the mean is at least as large as 
the (smallest) median. 


(e) Prove that (b) may be rewritten as 


elz) fe uflv)dy 


Se) = FF ela) EX) 


and thus that 


e(z) 
Se(z) < z+e(z) 


4.4 CREATING NEW DISTRIBUTIONS 


4.4.1 Introduction 

This section indicates how new parametric distributions can be created from 
existing ones. Many of the distributions in Appendix A were created this way. 
4.4.2 Multiplication by a constant 


This transformation is equivalent to applying inflation uniformly across all 
loss levels and is known as a change of scale. For example, if this year’s losses 
are given by the random variable X, then uniform inflation of 5% indicates 
that next year’s losses can be modeled with the random variable Y = 1.05X. 


Theorem 4.19 Let X be a continuous random variable with pdf fx(x) and 
cdf Fx(z). Let Y =0X with 0 >0. Then 


Fy (y) = Fx (=) , fyy)= atx (3) 


Proof: 
Fey) = Pr(Y <y) =Pr(0X < y) =Pr (x < a) SF% (2) 


ra) = Ero = gi (5): 


Corollary 4.20 The parameter @ is a scale parameter for the random variable 
r, i 


The following example illustrates this process. 
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Example 4.21 Let X have pdf f(z) = e7, x > 0. Determine the cdf and 
pdf of Y = 0X. ; 


Fx(2) = 1-e, Fy(y)=1-e%, 
Re 
fra) = 3e ue 
We recognize this as the exponential distribution. o 


4.4.3 Raising to a power 


Theorem 4.22 Let X be a continuous random variable with pdf fx(x) and 
cdf Fx (£) with Fx(0) =0. Let Y = X7. Then, if T >O, 

Fy(y) =Fx(y"), fy) =ty fx"), y>0 
while, if T <0, 


Fy(y) =1-Fx(y"), f(y) =- Fx"). (4.3) 
Proof: fr > 0 
Fy(y) = Pr(X <y") = Fx(y’), 
while if 7 < 0 
Fy(y) = Pr(X > y") =1- Fx(y’). 
The pdf follows by differentiation. d 


Tt is more common to keep parameters positive and so, when 7 is negative, 
create a new parameter T* = —7. Then (4.3) becomes 


Fy(y) =1-Fxiy7), fr) =ru *fx(y™ )- 
Drop the asterisk for future use of this positive parameter. 


Definition 4.23 When raising a distribution to a power, if T > 0 the result- 
ing distribution is called transformed, if T = —1 it is called inverse, and 
ifr < 0 (but is not —1) it is called inverse transformed. To create the 
distributions in Appendix A and to retain @ as a scale parameter, the base 
distribution should be raised to a power before being multiplied by 0. 


Example 4.24 Suppose X has the exponential distribution. Determine the 
cdf of the inverse, transformed, and inverse transformed exponential distribu- 
tions. 


The inverse exponential distribution with no scale parameter has cdf 


F(y)=1-[l-e 4] =e". 
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With the scale parameter added it is F(y) = e~9/¥. 
The transformed exponential distribution with no scale parameter has cdf 


F(y) = 1 — exp(—y’). 


With the scale parameter added it is F(y) = 1 — exp|—(y/6)"]. This distrib- 
ution is more commonly known as the Weibull distribution. 


The inverse transformed exponential distribution with no scale parameter 
has cdf 


F(y) = 1—[1—exp(—y™)] = exp(-y™). 
- With the scale parameter added it is F(y) = exp([—(6/y)*]. This distribution 
is the inverse Weibull. O 


Another base distribution has pdf f(z) = z*~+e~*/T'(a@). When a scale 
parameter is added, this becomes the gamma distribution. It has inverse 
and transformed versions that can be created using the results in this section. 
Unlike the distributions introduced to this point, this one does not have a 
closed form cdf. The best we can do is define notation for the function. 


Definition 4.25 The incomplete gamma function with parameter a > 0 
is denoted and defined by 


Tr(a; x)= “gl, tle dt 


while the gamma function is denoted and defined by 
Tr(a) =} tle dt. 
0 


In addition, Tr(a) = (a — 1)['(a — 1) and for positive integer values of 
n, T(n) = (n—1)!. Appendix A provides details on numerical methods of 
evaluating these quantities. Furthermore, these functions are built into most 
spreadsheet programs and many statistical and numerical analysis programs. 


4.4.4 Exponentiation 


Theorem 4.26 Let X be a continuous random variable with pdf fx(x) and 
cdf Fx (2x) with fx(x) > 0 for all real x. Let Y = exp(X). Then, for y > 0, 


Fy(y)=Fx(ny), fy(y)= = f(y). 


Proof: Fy(y) = Pr(e* < y) = Pr(X < lny) = Fx(Iny). © o 


Example 4.27 Let X have the normal distribution with mean p and variance 
a?. Determine the cdf and pdf of Y = e*. 
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Fy(y) = o( 2 £) 


e N a E. _l/iny-4)\’ 
fr) = 0 ( 7 ) = sre |-3( m Ji = 


We could try to add a scale parameter by creating W = 6Y, but this 
adds no value, as is demonstrated in Exercise 4.22. This example created the 
lognormal distribution (the name has stuck even though “expnormal” would 
seem more descriptive). 


4.4.5 Mixing 


The concept of mixing can be extended from mixing a finite number of random 
variables to mixing an uncountable number. In the following theorem, the pdf 
fa(A) plays the role of the discrete probabilities a; in the k-point mixture. 


Theorem 4.28 Let X have pdf fxja(z|A) and cdf Fx;,(x|A), where A is a 
parameter of X. While X may have other parameters, they are not relevant. 
Let À be a realization of the random variable A with pdf falà). Then the 
unconditional pdf of X is 


fx(t) = J faia(ald)fa(d) dà, (4.4) 


where the integral is taken over all values of A with positive probability. The 
resulting distribution is a misture distribution. The distribution function 
can be determined from 


Ra) os i: J fewer) fa0yarey 


Af focal) fa dey aX 
J Pastel) faQvaa. 


Moments of the mixture distribution can be found from 
E(X") = E[E(X"|A)] 
and, in particular, 


Var(X) = E[Var(X|A)] + Var[E(X|A)]. 
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Proof: The integrand is, by definition, the joint density of X and A. The 
integral is then the marginal density. For the expected value (assuming the 
order of integration can be reversed), f 


| [Aie dd da * 


[|S txua fa(A)dr 


1 E(X*|d) fa (Add 
E[E(X*|A)]. 


E(X*) 


Il 


1 


ll 


For the variance, 
Var(X) = E(X?) - EX)? 
E[E(X?|A)] — {E[E(X|A)]}? 
E{Var(X|A) + (E(X|A)]?} — EED? 
= EfVar(X|A)] + Var[E(X|A)]. Oo 


ll 


Il 


Note that, if fa (A) is discrete, the integrals must be replaced with sums. An 
alternative way to write the results is fx(x) = Ealfxy a(a|A)] and Fx(x) = 
Ea[Fxja(z|A)], where the subscript on E indicates that the random variable 
is A 

An interesting phenomenon is that mixture distributions tend to be heavy- 
tailed so this method is a good way to generate such a model. In particular, 
if fxja(a|A) has a decreasing hazard rate function for all À, then the mixture 
distribution will also have a decreasing hazard rate function (see Ross [114], 
pp. 407-409 for details). The following example shows how a familiar heavy- 
tailed distribution may be obtained by mixing. 


Example 4.29 Let X|A have an exponential distribution with parameter 1/A. 
Let A have a gamma distribution. Determine the unconditional distribution 
of X. 


We have (note that the parameter @ in the gamma distribution has been 
replaced by its reciprocal) 


Il 


o~ eer es eee 
fx(z) 7a |. Net AO dN 


i 07 jj e —A(2+8) Jy 
Sia 
0 T(a+1) 
T(a) (x +0)eH 
a84 
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This is a Pareto distribution. go 


The following example provides an illustration useful in Chapter 16. 


Example 4.30 Suppose that, given O = 0, X is normally distributed with 
mean @ and variance v, so that 


1 1 
Fxjo (x|?) = Ta E (z — n| , 00 <T <, 


and © is itself normally distributed with mean u and variance a, that is, 


1 1 2 
a | 70-0) |. —co < l < œ. 


Determine the marginal pdf of X. 


fo(@) = 


The marginal pdf of X is 


J nia ep|-5 -0 : e ag ?| do 
-œ V2Tv 2v V 27a ar n) 

= 1 e9 ji 9 1 9 

= be fae [=] a) aa ul sa 


We leave as an exercise for the reader the verification of the algebraic identity 


fx(z) 


_ 9)2 Len 2 
(x — @) 0 uy _ atu (o- =+) y eH 
Vv a va atv a+v 


obtained by completion of the square in 9. Thus, 


( 2 
atv _ ak + Up 4 
2va (6 atv ) | ays 


ess 
€ — 1—1 
p= al 2(a+v)| [° fatv 
x (2) = — exp 
V2r(a +v) Joo V 2rva 
We recognize the integrand as the pdf (as a function of 0) of a normal dis- 


tribution with mean (az + vu)/(a+v) and variance (va)/(a +v). Thus the 
integral is 1 and so 


a eae 
Fey eee —00 < T < OW; 
rla+o) ’ ? 
that is, X is normal with mean p and variance a + v. a 


The following example is taken from Hayne [50]. It illustrates how this 
type of mixture distribution can arise. In particular, continuous mixtures are 
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often used to provide a model for parameter uncertainty. That is, the exact 
value of a parameter is not known, but a probability density function can be 
elucidated to describe possible values of that parameter. f 


Example 4.31 In the valuation of warranties on automobiles it is important 
to recognize that the number of miles driven varies from driver to driver. It is 
also the case that for a particular driver the number of miles varies from year 
to year. Suppose the number of miles for a randomly selected driver has the 
inverse Weibull distribution but that the year-to-year variation in the scale 
parameter has the transformed gamma distribution with the same value for 
+. Determine the distribution for the number of miles driven in a randomly 
selected year by a randomly selected driver. 


Using the parameterizations from Appendix A, the inverse Weibull for miles 
driven in a year has parameters A (in place of ©) and 7 while the transformed 
gamma distribution for the scale parameter A has parameters T, 6, and a. 
The marginal density is 


e (Ogio ie 
f(a) = e maa) P 


arti 6 °T (a) 
T? Se eae! 
Be ate ee rte —N (277 +077) dà 
Tear f expl-à (£77 +077 )] 
T? P a Lryrtra-t 
= 7 TI —T gT —1/rir+ra—l , —y 
OT (a)r t! i aa aes i 


xy” ir HeT Ea gy dy 


T an 
— Tet) _ 
GT (a) tt (e77 +077) 
rab ate 


The third line is obtained by the transformation y = A (x77 +077). The 
final line uses the fact that T(a + 1) = aT (a). The result is an inverse 
Burr distribution. Note that this distribution applies to a particular driver. 
Another driver may have a different Weibull shape parameter 7 and as well 
that driver’s Weibull scale parameter © may have a different distribution and, 
in particular, a different mean. Oo 


4.4.6 Frailty models 


An important type of mixture distribution is a frailty model. Although the 
physical motivation for this particular type of mixture is originally from the 
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analysis of lifetime distributions in survival analysis, the resulting mathemat- 
ical convenience implies that the approach may also be viewed as a useful wa; 
to generate new distributions by mixing. J 
We begin by introducing a frailty random variable A > 0 and define the 
conditional hazard rate (given A = A) of X to be hyja (z]à) = a(x), where 
a(x) is a known function of x [that is, a(x) is to be specified in a particular 
nae B za is meant to quantify uncertainty associated with the 
azard rate, which by the above specification of the conditi 
acts in a multiplicative manner. i RE EE 
The conditional survival function of X|A is therefore si 


Sxja(z|A) =e fo hxja(tlA)dt — e7 A(z) 


where A(z) = Jo a(t)dt. In order to specify the mixture distribution (that is 
ae pe ee of X), we define the moment generating furiction 
of the frailty random variable A to be Af,(t) = E(et4 i 

survival function is aw Loge Rae: 


Sx(x) = Efe“4@] = Ma[-A(2)], (4.5) 
and obviously F'y (x) = 1 — Sx(z). 

The type of mixture to be used determines the choice of a(x) and hence 
A(z). The most important subclass of the frailty models is the class of expo- 
Renna ey with a(x) = 1 and A(x) = a, so that Sx ),(2|\) =e", z > 

; er use ixt i fei ixtur i -i and 
o mixtures include Weibull mixtures with a(x) = yg! and 

Evaluation of the frailty distribution requires an expression for the moment 
generating function Ma(t) of A. The most common choice is gamma frailty. 
but other choices such as inverse Gaussian frailty are also used. i 


Example 4.32 Let A have a gamma distribution and let X|A have a Weibull 
distribution with conditional survival function Sx\,(2|A) = e~**". Determine 
the unconditional or marginal distribution of X. 


Oa a re it ae from Example 3.15 that the gamma moment gener- 
ating function is M,(t) = (1 — 6t)~°, and fr i 
ie sno ) om (4.5) it follows that X has 


Sx (x) = My(—27) = (14+ 627). 
This is a Burr distribution (see Appendix A) with the usual parameter 0 


-1 
To by. 8 / a Note that when y = 1 this is an exponential mixture 
which is a Pareto distribution, considered previously in Example 4.29. m 


As mentioned earlier, mixing tends to create heavy-tailed distributions 
and in particular a mixture of distributions which all have decreasing Hazard 
Tates also has a decreasing hazard rate. In Exercise 4.32 the reader is asked 
to prove this fact for frailty models. For further details on frailty models, see 


_ the book by Hougaard [63]. 
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4.4.7 Splicing 


Another method for creating a new distribution is by splicing. This approach 
is similar to mixing in that it might be believed that two or more separate 
processes are responsible for generating the losses. With" mixing, the various 
processes operate on subsets of the population. Once the subset is identified, 
a simple loss model suffices. For splicing, the processes differ with regard to 
the loss amount. That is, one model governs the behavior of losses in some 
interval of possible losses while other models cover the other intervals. The 
following definition makes this precise. 


Definition 4.33 A k-component spliced distribution has a density func- 
tion that can be expressed as follows: 


fit), co<@<c1, 
a2 fo(z), Cı <T < Ca, 


fx(z) = 
anfa(Z), Ck-1 <T < Ck- 


For j = 1,...,k, each aj > 0 and each f(x) must be a legitimate density 
function with all probability on the interval (cj~1,¢;). Also, a, +: +a =1. 


Example 4.34 Demonstrate that Model 5 on Page 21 is a two-component 
spliced model. 


The density function is 


f(a) = { 00L 92 < 50, 
= 1 0.02, 50<2< 75 


and the spliced model is created by letting f,(z) = 0.02, 0 < x < 50, which 
is a uniform distribution on the interval from 0 to 50, and fo(x) = 0.04, 
50 < x < 75, which is a uniform distribution on the interval from 50 to 75. 
The coefficients are then a; = 0.5 and ay = 0.5. o 


It was not necessary to use density functions and coefficients, but this is 
one way to ensure that the result is a legitimate density function. When using 
parametric models, the motivation for splicing is that the tail behavior may 


were between 0 and 300, 3 were between 300 and 350 
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ae ed aa construct a spliced model is to use standard distributions 
elicit Sri Hag — Ck. Let gj (x) be the jth such density function 
eee a n 4.33 replace f;(x) with 9;(x)/[G(e;) — G(cj—1)] This 
i es it easier to have the break points becom h 

can be estimated. a gee 
Neither approach to splicing ensures that the resulting density function will 


be continuous (that is, the co i 
spe ; mponents will meet A - 
a restriction could be added to the specification. par oe points). Such 


Exam ; f 
ple 4.35 Create a two-component spliced model using an exponential] 


distribution from 0 to c and stributi j 
pati a Pareto distribution (using y in place of 0) from 


The basic format is 


9~te-2/8 
Oly eo 0 
fx(z) = 1 -E c/8 zaa LT < C, 
ay*(z +7) 
> Ty ra cee CL BK mH. 


es 2) 


However, we must force the density 


ae ete ction to integrate to 1. All that is 


—v. The spliced density function becomes 
Q71¢-2/6 
rar O<2< Cc, 


y) 2C) i 
(1 Da CLIT < 


g) = 
fx (x) 9,a,7,¢>0, 0<v<1. 


Figure 4.4 illustrates this density 


9 = 100, 7 = 200, and a = function using the values c = 100, v = 0.6, 


4. It is clear that this density is not continuous. O 


4.4.8 Exercises 


4.18 Let X have cdf Pe (a) == 


and cdf of Y = 0X. (l+2)~*, 2,a > 0. Determine the pdf 


4.19 (*) One hundred observed claim 


be inconsistent with the behavior for small losses. For example, experience 
(based on knowledge beyond that available in the current, perhaps small, 
data set) may indicate that the tail follows the Pareto distribution, but there 
is a positive mode more in keeping with the lognormal or inverse Gaussian 
distributions. A second instance is when there is a large amount of data below 
some value but a limited amount of information elsewhere. We may want to 
use the empirical distribution (or a smoothed version of it) up to a certain — 
point and a parametric model beyond that value. The definition given above 
is appropriate when the break points co,..., cj, are known in advance. 


, and the remaining 40 wi 
Sati eo oe g ere above 600. For the next three years, 


from 1995, determine a range 
1998 (there is not enough info: 


4.20 Let X have the Par 
formed, inverse, and invers 
determine if any of these 


eto distribution. Determine the cdf of the trans- 


e transformed distributions. Check Appendix A to 
distributions have special names. 
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Fig. 4.4 Two-component spliced density. 


4.21 Let X have the loglogistic distribution. Demonstrate that the inverse 
distribution also has the loglogistic distribution. Therefore there is no need 
to identify a separate inverse loglogistic distribution. 


4.22 Let Y have the lognormal distribution with parameters u and o. Let 
Z = OY. Show that Z also has the lognormal distribution and therefore the 
addition of a third parameter has not created a new distribution. 


4.23 (*) Let X have a Pareto distribution with parameters a and 6. Let 
Y = ln(1 + X/6). Determine the name of the distribution of Y and its 
parameters. 


4,24 In [132], Venter noted that if X has the transformed gamma distribution 
and its scale parameter 0 has an inverse transformed gamma distribution 
(where the parameter 7 is the same in both distributions) the resulting mixture 
has the transformed beta distribution. Demonstrate that this is true. 


4.25 (*) Let N have a Poisson distribution with mean A. Let A have a 
gamma distribution with mean 1 and variance 2. Determine the unconditional 
probability that N = 1. 


4.26 (*) Given a value of © = 0, the random variable X has an exponential 
distribution with hazard rate function h(x) = 0, a constant. The random vari- 
able © has a uniform distribution on the interval (1,11). Determine Sx (0.5) 
for the unconditional distribution. 


4.27 (*) Let N have a Poisson distribution with mean A. Let A have a uni- 
form distribution on the interval (0,5). Determine the unconditional proba- 
bility that N > 2. 
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4.28 Determine the probability density function and the hazard rate of the 
frailty distribution. 


4.29 Suppose that X|A has the Weibull survival function Sx), (z|A) = en 


xz > 0, and A has an exponential distribution. Demonstrate that the uncon- 
ditional distribution of X is loglogistic. 


4.30 Consider the exponential-inverse Gaussian frailty model with a(x) = 


6/(2V1+ x), where @ > 0. 
(a) Verify that the conditional hazard rate hyja (z|) of X|A is indeed 
a valid hazard rate. 
(b) Determine the conditional survival function S.x\,(z|A). 


(c) If A has a gamma distribution with parameters 0 = 1 and a re- 


placed by 2a, determine the marginal or unconditional survival 
function of X. 


(d) Use (c) to argue that a given frailty model may arise from more 
than one combination of conditional distributions of X|A and frailty 
distributions of A. 


4.31 Suppose that X has survival function Sy (zx) = 1—F'x(z) given by (4.5). 
Show that Sı (x) = Fx (z)/{E(A)A(z)] is again a survival function of the form 
given by (4.5), and identify the distribution of A associated with Sı (z). 


4.32 Fix s > 0, and define an “Esscher-transformed” frailty random variable 
As with probability density function (or discrete probability mass function in 
the discrete case) fa, (A) = e75 fa(A)/Ma(—s), > 0. 


(a) Show that A, has moment generating function 


Ma, (t) = E(e'“*) = mra 
(b) Define the cumulant generating function of A to be 
ca (t) = In[AL, (2)], 
and use (a) to prove that 


c,(—s) = E(A,) and ck(—s) = Var(As). 


(c) For the frailty model with survival function given by (4.5), prove 
that the associated hazard rate may be expressed as hx(r) = 
a(z)c),{—A(z)], where c, is defined in (b). 

(d) Use (c) to show that 


hix (x) = a'(z)ca|-A(2)] — [a(z) ch [-A(2)]- 
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(e) Prove using (d) that, if the conditional hazard rate hx,(z|,) is 
nonincreasing in z, then hx(z) is also nonincreasing in z. 


4.33 Write the density function for a two-component spliced model in which 
the density function is proportional to a uniform density, over the interval 
from 0 to 1,000 and is proportional to an exponential density function from 
1,000 to co. Ensure that the resulting density function is continuous. 


4.34 Let X have pdf f(x) = exp(—|x/6|)/20 for —co < z < œ. Let Y =e*. 
Determine the pdf and cdf of Y. 


. 4.35 (*) Losses in 1993 follow the density function f(x) = 3r74t, z > 1, 
where x is the loss in millions of dollars. Inflation of 10% impacts all claims 
uniformly from 1993 to 1994. Determine the cdf of losses for 1994 and use it 
to determine the probability that a 1994 loss exceeds 2,200,000. 


4.36 Consider the inverse Gaussian random variable X with pdf (from Ap- 


pendix A) 
a 0 6 fx-pu á 
ro- yz- - Ji: Esi, 


where 0 > 0 and u > 0 are parameters. 


(a) Derive the pdf of the reciprocal inverse Gaussian random variable 


1/X. 
(b) Prove that the “joint” moment generating function of X and 1/X 
is given by 
M(tı, t2) = E (A) 


8 — 2t 


8 8 — J/ (6 — 2p°t1) (8 — 2t2) 
u ? 


where tı < 0/ (2u”) and ty < 0/2. 
(c) Use (b) to show that the moment generating function of X is 


0 2u? 6 
se = bid ef pero 2 
Mx (t) =E (e ) el ( 1 7 e)l: t< De 


and that this agrees with the reparameterized result in Exercise 
3.24. 


(d) Use (b) to show that the reciprocal inverse Gaussian random vari- 
able 1/X has moment generating function 


Myx(t) = B(e**) 


D 
| 
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Hence prove that 1/X has the same distribution as Z1 + Z2, where 
Z, has a gamma distribution, Z2 has an inverse Gaussian distrib- 
ution, and Z, is independent of Z2. Also, identify the gamma and 
inverse Gaussian parameters in this representation. 


(e) Use (b) to show that 


2 ‘ sas 
z=-4 (4) 
X\ H 


has a gamma distribution with parameters a = 5 and the usual 
parameter 0 (in Appendix A) replaced by 2/0. 


y 4.5 SELECTED DISTRIBUTIONS AND THEIR RELATIONSHIPS 


4.5.1 Introduction 


There are many ways to organize distributions into groups. Families such 
as Pearson (12 types), Burr (12 types), Stoppa (5 types), and Dagum (11 
types) are discussed in Chapter 2 of [73]. The same distribution can appear 
in more than one system, indicating that there are many relations among 
the distributions beyond those presented here. The systems presented in the 
next subsection are particularly useful for actuarial modeling because all the 
members have support on the positive real line and all tend to be skewed to the 
right. For a comprehensive set of continuous distributions, the two volumes 
by Johnson, Kotz, and Balakrishnan [67] and [68] are a valuable reference. 
In addition, there are entire books devoted to single distributions (such as 
Arnold [5] for the Pareto distribution). 


4.5.2 Two parametric families 


As noted when defining parametric families, many of the distributions pre- 
sented in this section and in Appendix A are special cases of others. For 
example, a Weibull distribution with 7 = 1 and @ arbitrary is an exponential 
distribution. Through this process, many of our distributions can be orga- 
nized into groupings, as illustrated in Figures 4.5 and 4.6. The transformed 
beta family includes two special cases of a different nature. The paralogistic 
and inverse paralogistic distributions are created by setting the two nonscale 
parameters of the Burr and inverse Burr distributions equal to each other 
rather than to a specified value. 
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Transformed beta h 
__(@,8,7,7) 


Paralogistic |} | 
(t=y, a=1) | 


CH 


Inverse Pareto } 
(y =Q=1 


Pareto 
Q= —_ 


Fig. 4.6 Tvransformed/inverse transformed gamma family. 


4.5.3 Limiting distributions 


The classification in the preceding section involved distributions that are spe- 
cial cases of other distributions. Another way to relate distributions is to see 
what happens as parameters go to their limiting values of zero or infinity. 


Example 4.36 Show that the transformed gamma distribution is a limiting 
case of the transformed beta distribution as 0 — co, a — 00, and 6/a'/7 — €, 


a constant. 

The demonstration relies on two facts concerning limits: 
a al eee r 
a— oo Tr(a) 


and 
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The limit in (4.6) is known as Stirling’s formula and provides an approximation 
for the gamma function. The limit in (4.7) is a standard result found in most - 
calculus texts. 

To ensure that the ratio 0/a1/7 goes to a constant, it is sufficient to force it 
to be constant as a and 8 become larger and larger. This can be accomplished 
by substituting € al/Y for @ in the transformed beta pdf and then letting 
ai — oo. The first steps, which also include using Stirling’s formula to replace 
two of the gamma function terms, are 


T(a + ryyr? 1 
T(a)P(7)0%7 (1 + 219-7) o+7 
eT ant (a + Que (2r)! / 2yr} 
T(r)€™ [1 + (2/6) /a] 2*7 


f(z) 


The two limits 


r atr—-1/2 ere Y 
lim (+2 =e, lim h+ EEX = e(2/8) 


a-70o a-co 
can be substituted to yield 


yrit e (2/8) 


jim f(x) => T(r)é™ 
which is the pdf of the transformed gamma distribution. o 


With a similar argument, the inverse transformed gamma distribution is 
obtained by letting T go to infinity instead of a (see Exercise 4.39). 

Because the Burr distribution is a transformed beta distribution with 7 = 1, 
its limiting case is the transformed gamma with 7 = 1 (using the parameteri- 
zation in the previous example), which is the Weibull distribution. Similarly, 
the inverse Burr has the inverse Weibull as a limiting case. Finally, letting 
T = y = 1 shows that the limiting case for the Pareto distribution is the 
exponential (and similarly for their inverse distributions). 

As a final illustration of a limiting case, consider the transformed gamma. 
distribution as parameterized above. Let yh Jer — o and y7!(€'r—1) = p. 
If this is done by letting T — œ (so both y and € must go to zero), the limiting 
distribution will be lognormal. 

In Figure 4.7 some of the limiting and special case relationships are shown. 


(4.6) 
Other interesting facts about the various distributions are also given.* 


(4.7) 


4Thanks to Dave Clark of American Re-Insurance Company for creating this picture. 
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Fig. 4.7 Distributional relationships and characteristics. 


4.5.4 Exercises 


4.37 For a Pareto distribution, let both a and @ go to infinity with the ratio 
o./6 held constant. Show that the result is an exponential distribution. 


4.38 Determine the limiting distribution of the generalized Pareto distribu- 
tion as a and @ both go to infinity. 


4.39 Show that as 7 — o0 in the transformed beta distribution the result is 
the inverse transformed gamma distribution. 


4.6 DISCRETE DISTRIBUTIONS 


4.6.1 Introduction 


The purpose of this section is to introduce a large class of counting distri- 
butions. Counting distributions are discrete distributions with probabilities 
only on the nonnegative integers; that is, probabilities are defined only at the 
points 0,1,2,3,4,.... In an insurance context, counting distributions describe 
the number of events such as losses to the insured or claims to the insurance 
company. With an understanding of both the number of claims and the size of 
claims, one can have a deeper understanding of a variety of issues surrounding 
insurance than if one has only information about total losses. The description 
of total losses in terms of numbers and amounts separately also allows one to 
address issues of modification of an insurance contract. Another reason for 
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separating numbers and amounts of claims is that models for the number of 
claims are fairly easy to obtain and experience has shown that the commonly 
used distributions really do model the propensity to generate losses. 

We now formalize some of the notation that will be used for models for dis- 
crete phenomena. The probability function (pf) p denotes the probability 
that exactly k events (such as claims or losses) occur. Let N be a random 
variable representing the number of such events. Then 


prk = Pr(N = k), k=0,1,2,.... 
As a reminder, the probability generating function (pgf) of a discrete random 
variable N with pf pp is 
CoO 
P(z) = Py(z) = E (27) = So pez. (4.8) 
k=0 


As is true with the moment generating function, the pgf can be used to 
generate moments. In particular, P’(1) = E(N) and P”(1) = E[N(N — 1)] 
(see Exercise 4.42). To see that the pgf really does generate probabilities, 
qd™ 
PO (z) = E ( ~) = E[N(N -1)--- (N -m +1) "] 


dz™ i 


= ÑO k(k-1)---(k-m+1)z*-™p, 


k=m 
(m) 
P@(0) = mpm or pm = ze), 
m! 
4.6.2 The Poisson distribution 
The pf for the Poisson distribution is 
e>* 
kl? 
The probability generating function from Example 3.16 is 


P(z)=@@-), = A>0. 


Pk = k=0,1,2,.... 


The mean and variance can be computed from the probability generating 
function as follows: 


E(N) = P'(1)= 
E[N(N-1)] = P"()=+? 
Var(N) = E[N(N —1)] + E(N) — (EW)? 
= X+) 
À. 
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For the Poisson distribution the variance is equal to the mean. The Poisson 
distribution can arise from a Poisson process (to be discussed in Chapter 8). 
The Poisson distribution and Poisson processes are also discussed in-many 
textbooks in statistics and actuarial science, including Panis and Willmot 
[106] and Ross [116]. 

The Poisson distribution has at least two additional useful properties. The 
first is given in the following theorem. 


Theorem 4.37 Let Ni,...,Nn be independent Poisson variables with para- 
meters \i,...,An- Then N= Ni +---+Nn has a Poisson distribution with 
- parameter Ay +*+ An. 


Proof: The pgf of the sum of independent random variables is the product 
of the individual pgfs. For the sum of Poisson random variables we have 


[| Ps) = [Lewis - 0) 


Py (z) = 
j=l j=l 
= exp|> A(z- 1) 
j=l 
an eee 


where A = A\1-+---+An-. Just as is true with moment generating functions, the 


pgf is unique and therefore N must have a Poisson distribution with parameter 
a 


The second property is particularly useful in modeling insurance risks. Sup- 
pose that the number of claims in a fixed time period, such as one year, follows 
a Poisson distribution. Further suppose that the claims can be classified into 
m distinct types. For example, claims could be classified by size, such as 
those below a fixed limit and those above the limit. It turns out that, if one is 
interested in studying the number of claims above the limit, that distribution 
is also Poisson but with a new Poisson parameter. 

This is also useful when considering removing or adding a part of an insur- 
ance coverage. Suppose that the number of claims for a complicated medical 
benefit coverage follows a Poisson distribution. Consider the “types” of claims 
to be the different medical procedures or medical benefits under the plan. If 
one of the benefits is removed from the plan, again it turns out that the distri- 
bution of the number of claims under the revised plan will still have a Poisson 
distribution but with a new parameter. 

In each of the cases mentioned in the previous paragraph, the number of 
claims of the different types will not only be Poisson distributed but also be 
independent of each other; that is, the distributions of the number of claims 
above the limit and the number below the limit will be independent. This 
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is a somewhat surprising result. For example, suppose we currently sell a 
policy with a deductible of 50 and experience has indicated that a Poisson 
distribution with a certain parameter is a valid model for the number of 
payments. Further suppose we are also comfortable with the assumption that 
the number of losses in a period also has the Poisson distribution but we do 
not know the parameter. Without additional information, it is impossible to 
infer the value of the Poisson parameter should the deductible be lowered or 
removed entirely. We now formalize these ideas in the following theorem. 


Theorem 4.38 Suppose that the number of events N is a Poisson random 
variable with mean A. Further suppose that each event can be classified into 

one of m types with probabilities pi,...,Dm independent of all other events. 
Then the number of events Ny,...,Nm corresponding to event types 1,...,m 
respectively, are mutually independent Poisson random variables with means 
Api, -+-;ADm; respectively. 


Proof: For fixed N = n, the conditional joint distribution of (Ni,...,Nm) 

is multinomial with parameters (n,pi,...,Pm)- Also, for fixed N = n, the 

conditional marginal distribution of N; is binomial with parameters (n, p;). 
The joint pf of (Ni,...,Nm) is given by 


Pr(Ni = tag. Nm = tin) = Pr(Ni His Nm =Nm|N =n) 
x Pr(N =n) 
n! + Se aay 
= — FP a S 


nalno! -Nm n! 


= II en Pi Ops)" 
jal nj! 


where n =ni +n2+---+Nm. Similarly, the marginal pf of N; is determined 
below. 


Pr(N; = nį) 5 Pr(N; = |N =n) Pr(N =n) 


NEN} 


i 
ge! 
POTN 
eS 3 
Se: 
3S, 
— 
| 
3 
— 
3 
| 
è 
Q 
z 


n=nj 
~a py) eò(l—p3) 
nj! 
eTòPi (Ap;)” i 
nj! 
Hence the joint pf is the product of the marginal pfs, establishing mutual 
independence. E 
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Example 4.39 In a study of medical insurance the expected number of claims 
per individual policy is 2.3 and the number of claims is Poisson distributed. 
You are considering removing one medical procedure from the coverage under 
this policy. Based on historical studies, this procedure accounts for approzi- 
mately 10% of the claims. Determine the new frequency distribution. 


From this it follows that the mean and variance of the negative binomial 
distribution are 


E(N)=r@ and Var(N) =r6(1+ 8). 


‘Because ĝ is positive, the variance of the negative binomial distribution 
exceeds the mean. This is in contrast to the Poisson distribution for which 
the variance is equal to the mean. This suggests that for a particular set of 
data, if the observed variance is larger than the observed mean, the negative 
binomial might be a better candidate than the Poisson distribution as a model 
to be used. 

The negative binomial distribution is a generalization of the Poisson in 
at least two different ways, namely as a mixed Poisson distribution with a 
gamma mixing distribution (demonstrated later in this subsection) and as 
a compound Poisson distribution with a logarithmic secondary distribution 
(see Section 4.6.7). Another view of the Poisson distribution is presented in 
Chapter 8. There, among other assumptions, the rate at which claims occur 
is assumed constant over time. If the rate is linearly increasing with regard to 
the number of past claims, then the number of claims in any period will have 
the negative binomial distribution. See Insurance Risk Models [106, Theorem 
3.6.1] for this derivation of the negative binomial distribution. 

The geometric distribution is the special case of the negative binomial 
distribution when r = 1. The geometric distribution is, in some senses, the 
discrete analogue of the continuous exponential distribution. Both the geo- 
metric and exponential distributions have an exponentially decaying proba- 
bility function and hence the memoryless property. The memoryless property 
can be interpreted in various contexts as follows. If the exponential distribu- 
tion is a distribution of lifetimes, then the expected future lifetime is constant 
for any age. If the exponential distribution describes the size of insurance 
claims, then the memoryless property can be interpreted as follows: Given 
that a claim exceeds a certain level d, the expected amount of the claim in 
excess of d is constant and so does not depend on d. That is, if a deductible 
of d is imposed, the expected payment per claim will be unchanged, but of 
course the expected number of payments will decrease. If the geometric dis- 
tribution describes the number of claims, then the memoryless property can 
be interpreted as follows: Given that there are at least m claims, the proba- 
bility distribution of the number of claims in excess of m does not depend on 
m. Among continuous distributions, the exponential distribution is used to 
distinguish between subexponential distributions with heavy (or fat) tails and 
distributions with light (or thin) tails. Similarly for frequency distributions, 
distributions that decay in the tail slower than the geometric distribution are 
often considered to have long tails, whereas distributions that decay more 
rapidly than the geometric have short tails. The negative binomial distribu- 
tion has a long tail (decays more slowly than the geometric distribution) when 
r <1 and a lighter tail than the geometric distribution when r > 1. 


From Theorem 4.38, we know that the distribution of the number of claims 
expected under the revised insurance policy after removing the procedure 
from coverage is Poisson with mean 0.9(2.3) = 2.07. In carrying out studies 
of the distribution of total claims, and hence the appropriate premium under 

` the new policy, one also needs to study the change in the amounts of losses, 
the severity distribution, because the distribution of amounts of losses for 
the procedure which was removed may be different from the distribution of 
amounts when all procedures are covered. (m) 


4.6.3 The negative binomial distribution 


The negative binomial distribution has been used extensively as an alternative 
to the Poisson distribution. Like the Poisson distribution, it has positive 
probabilities on the nonnegative integers. Because it has two parameters, it 
has more flexibility in shape than the Poisson. 


Definition 4.40 The probability function of the negative binomial distri- 
bution is given by 


mo = on (EA (oh) 


k=0,1,2,..., r>0,8>0. (4.9) 


The binomial coefficient is to be evaluated as 


e _ s(æ—1)-(z-k+1) 


k) k! f 


While k must be an integer, x may be any real number. When x > k—-1, it 
can also be written as 


z\ T(z+1) 
k) T(k+1IT(r-k+1) 
which may be useful because nI'(x) is available in many spreadsheets, pro- 


gramming languages, and mathematics packages. 


It is not difficult to show that the probability generating function for the 
negative binomial distribution is 


P(z) =(1-A(z-DI"- 
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As noted earlier, one way to create the negative binomial distribution is 
as a mixture of Poissons. Suppose that we know that a risk has a Poisson 
number of claims distribution when the risk parameter is known. ‘Now 
treat À as being the outcome of a random variable A. We will denote the 
pf of A by u(A), where A may be continuous or discrete, and denote the 
cdf by U(A). The idea that A is the outcome of a random variable can be 
justified in several ways. First, we can think of the population of risks as 
being heterogeneous with respect to the risk parameter A. In practice this 
makes sense. Consider a block of insurance policies with the same premium, 
such as a group of automobile drivers in the same rating category. Such 
categories are usually broad ranges such as 0-7,500 miles per year, garaged 
in a rural area, commuting less than 50 miles per week, and so on. We know 
that not all drivers in the same rating category are the same even though they 
may “appear” to be the same from the point of view of the insurer and are 
charged the same premium. The parameter À measures the expected number 
of accidents. If \ varies across the population of drivers, then we can think of 
the insured individual as a sample value drawn from the population of possible 
drivers. This means implicitly that À is unknown to the insurer but follows 
some distribution, in this case u(A), over the population of drivers. The true 
value of À is unobservable. All we observe are the number of accidents coming 
from the driver. There is now an additional degree of uncertainty, that is, 
uncertainty about the parameter. 

This is the same mixing process that was discussed with regard to con- 
tinuous distributions in Section 4.4.5. In some contexts this is referred to 
as parameter uncertainty. In the Bayesian context, the distribution of A is 
called a prior distribution and the parameters of its distribution are sometimes 
called hyperparameters. The role of the distribution u(-) is very important in 
credibility theory, the subject of Chapter 16. When the parameter A is un- 
known, the probability that exactly k claims will arise can be written as the 
expected value of the same probability but conditional on A = À where the 
expectation is taken with respect to the distribution of A. From the law of 
total probability, we can write 


Pr(N = k) 
E[Pr(N = kļA)] 


| * pr(W = KIA = A)u(A) dd 
0 

o0 -AE 
á f ETOS 


Now suppose A has a gamma distribution. Then 


Pk 


Il 


à 


o0 eon Aele? 1 1 oS 1). bh 
= ee = ara | OTDA dA. 
Pk f kl 6°T(a) 6 rah 
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From the definition of the gamma distribution in Appendix A, this expres- 
sion can be evaluated as 


Pk = Tk+a) OF 
kra) (1 +6)era 


_ aa g\P a Ne 
k 1+8 1+ z) ` 
This formula is of the same form as (4.9), demonstrating that the mixed 
Poisson, with a gamma mixing distribution, is the same as a negativebinomial 
distribution. 
It is worth noting that the Poisson distribution is a limiting case of the 
negative binomial distribution. To see this, let r go to infinity and 8 go to 


zero while keeping their product constant. Let A = rf be that constant. 


spec 6 = A/r in the pef leads to (using L’H6pital’s rule in lines 3 and 


tim fx AZO] = epf tim -rm [1 - %2=9]} 


= ong MEAG 
a exp { im, Dooe Di MEU) 
= ep | in A] 


= exp { lim [A(z = 1]} 
= exp[\(z - 1)] 
which is the pgf of the Poisson distribution. 


4.6.4 The binomial distribution 


The binomial distribution is another counting distribution that arises natu- 
rally in claim number modeling. It possesses some properties different from 
the Poisson and the negative binomial that make it particularly useful. First 
its variance is smaller than its mean. This makes it useful for data sets a 
which the observed sample variance is less than the sample mean. This con- 
trasts with the negative binomial, where the variance exceeds the mean, and 
it also contrasts with the Poisson distribution, where the variance is equal to 
the mean. 

Second, it describes a physical situation in which m risks are each subject to 
claim or loss. We can formalize this as follows. Consider m independent and 


_ identical risks each with probability q of making a claim. This might apply to 
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a life insurance situation in which all the individuals under consideration are 
in the same mortality class; that is, they may all be male smokers at age 35 
and duration 5 of an insurance policy. In that case, q is the probability that 
a person with those attributes will die in the next year. Then the number 
of claims for a single person follows a Bernoulli distribution, a distribution 
with probability 1 — q at 0 and probability q at 1. The probability generating 
function of the number of claims per individual is then given by 


P(z)=(1- q)2? + qz' = 1 +g(2 — 1). 


Now, if there are m such independent individuals, then the probability gener- 
ating functions can be multiplied together to give the probability generating 
function of the total number of claims arising from the group of m individuals. 
That probability generating function is 


P(z) = [1 +q(z — 1)”, 0<q<1. 


Then from this it is easy to show that the probability of exactly k claims from 
the group is 


m= PN = 8) = (F)E0-0, k=. 


the pf for a binomial distribution with parameters m and q. From this 
Bernoulli trial framework, it is clear that at most m events (claims) can oc- 
cur. Hence, the distribution only has positive probabilities on the nonnegative 
integers up to and including m. 

Consequently, an additional attribute of the binomial distribution that is 
sometimes useful is that it has finite support; that is, the range of values for 
which there exist positive probabilities has finite length. This may be useful, 
for instance, in modeling the number of individuals injured in an automobile 
accident or the number of family members covered under a health insurance 
policy. In each case it is reasonable to have an upper limit on the range 
of possible values. It is useful also in connection with situations where it is 
believed that it is unreasonable to assign positive probabilities beyond some 
point. For example, if one is modeling the number of accidents per automobile 
during a one-year period, it is probably physically impossible for there to be 
more than some number, say 12, of claims during the year given the time 
it would take to repair the automobile between accidents. If a model with 
probabilities that extend beyond 12 were used, those probabilities should be 
very small so that they have little impact on any decisions that are made. 
The mean and variance of the binomial distribution are given by 


E(N) =mq, Var(N) = mq(1 — q9). 
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Table 4.1 Members of the (a, b, 0) class 


Distribution a b Po 
Poisson 0 À ew 
Binomial eer (m+ ear =q)” 
Ne ative binomial os — BE- = 
g mi ay (ys (+p) 
G tri P = 
eometric itp 0 (1+ B)-* 


4.6.5 The (a, b, 0) class 


The following definition characterizes the members of this class of distribu- 
tions. 


Definition 4.41 Let p, be the pf of a discrete random variable. It is a mem- 


ber of the (a,b,0) class of distributions, provided that there exists con- 
stants a and b such that 


k= 1,2,3,.... 


This recursion describes the relative size of successive probabilities in the 
counting distribution. The probability at zero, po, can be obtained from the 
recursive formula because the probabilities must add up to 1. This provides 
a boundary condition. The (a,b,0) class of distributions is a two-parameter 
class, the two parameters being a and b. By substituting in the probability 
function for each of the Poisson, binomial, and negative binomial distributions 
on the left-hand side of the recursion, it can be seen that each of these three 
distributions satisfies the recursion and that the values of a and b are as 
given in Table 4.1. In addition the table gives the value of po, the starting 
value for the recursion. Also in the table is the geometric distribution, the 
one-parameter special case (r = 1) of the negative binomial distribution. 

It can be shown (see Panjer and Willmot [106, Chapter 6]) that these are 
the only possible distributions satisfying this recursive formula. 

The recursive formula can be rewritten as 


k Pk 
Pk—1 


= ak +b, k= 1,2,3,.... 


The expression on the left-hand side is a linear function in k. Note from Ta- 
ble 4.1 that the slope a of the straight line is 0 for the Poisson distribution, is 
negative for the binomial distribution, and is positive for the negative bing: 
mial distribution, including the geometric. This suggests a graphical way of 
indicating which of the three distributions might be selected for fitting. First, 


eee —_ 
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Table 4.2 Accident profile 


Number of Number of M 
sii k 
accidents, k policies, nk a 
0 7,840 
{ 1,317 0.17 
3 42 0.53 
4 14 1.33 
6 4 6.00 
8+ 0 
Total 9,461 
one can plot T 
k P Ere pak 
Pk—1 Nnk—i 


against k. The observed values should form approximately a straight line if 


one of these models is to be selected, and the value of the slope should be an- 


indication of which of the models should be selected. Note that this cannot 


be done if any of the ng are 0. Hence this procedure is less useful for a small 


number of observations. 


Example 4.42 Consider the accident data in Table 4.2, which is taken from 
Thyrion [128]. For the 9,461 automobile insurance policies studied, the num- 
ber of accidents under the policy is recorded in the table. Also recorded in the 
table is the observed value of the quantity that should be linear. 


Figure 4.8 plots the value of the quantity of interest against k, the number 
of accidents. Jt can be seen from the graph that the quantity of interest 
looks approximately linear except for the point at k = 6. The reliability of 
the quantities as k increases diminishes because the number of observations 
becomes small and the variability of the results grows. This illustrates the 
weakness of this ad hoc procedure. Visually, all the points appear to have 
equal value. However, the points on the left are more reliable than the points 
on the right. From the graph, it can be seen that the slope is positive and 
the data appear approximately linear. This suggests the negative binomial 
distribution is an appropriate model. Whether or not the slope is significantly 
different from 0 is also not easily judged from the graph. By rescaling the 
vertical axis of the graph, the slope can be made to look steeper and hence the 
slope could be made to appear to be significantly different from 0. Graphically, 
it is difficult to distinguish between the Poisson and the negative binomial 


DISCRETE DISTRIBUTIONS 83 


Ratio 
w 


Fig. 4.8 Plot of the ratio knk/ng-ı against k. 


distribution because the Poisson requires a slope of 0. However, we can say 
that the binomial distribution is probably not a good choice since there is no 
evidence of a negative slope. In this case it is advisable to fit both the Poisson 
and negative binomial distributions and conduct a more formal test to choose 
between them. 

It is also possible to compare the appropriateness of the distributions by 
looking at the relationship of the variance to the mean. For this data set, the 
mean number of claims per policy is 0.2144. The variance is 0.2889. Because 
the variance exceeds the mean, the negative binomial should be considered as 
an alternative to the Poisson. Again this is a qualitative comment because 
we have, at this point, no formal way of determining whether the variance 
is sufficiently larger than the mean to warrant use of the negative binomial. 
In order to do some formal analysis, Table 4.3 gives the results of maximum 
likelihood estimation (to be discussed in Chapter 12) of the parameters of the 
Poisson and negative binomial distributions and the negative loglikelihood in 
each case. In Chapter 13 formal selection methods are presented. They would 
indicate that the negative binomial is superior to the Poisson as a model for 
this data set. However, those methods also indicate that the negative binomial 
is not a particularly good model, and thus some of the distributions yet to be 
introduced should be considered. 

In subsequent subsections we will expand the class of the distributions 
beyond the three discussed in this section by constructing more general models 
related to the Poisson, binomial, and negative binomial distributions. 


X 4.6.6 Truncation and modification at zero 


At times, the distributions discussed previously do not adequately describe 
the characteristics of some data sets encountered in practice. This may be 
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Table 4.3 Poisson-negative binomial comparison 


a aaastal 


Parameter - 
Distribution estimates —Loglikelihood 
Poisson À = 0.2143537 5,490.78 
Negative binomial Ê = 0.3055594 5,348.04 


F = 0.7015122 


nnn 


because the tail of the negative binomial is not heavy enough or because the 
distributions in the (a,b, 0) class cannot capture the shape of the data set in 
some other part of the distribution. 

In this section, we address the problem of a poor fit at the left-hand end 
of the distribution, in particular the probability at zero. 

For insurance count data, the probability at zero is the probability that 
no claims occur during the period under study. For applications in insurance 
where the probability of occurrence of a loss is low, the probability at zero 
has the largest value. Thus, it is important to pay special attention to the fit 
at this point. 

There are also situations that naturally occur which generate unusually 
large probabilities at zero. Consider the case of group dental insurance. If, in 
a family, both husband and wife have coverage with their employer-sponsored 
plans and both group insurance contracts provide coverage for all family mem- 
bers, the claims will be made to the insurer of the plan which provides the 
better benefits, and no claims may be made under the other contract. Then, in 
conducting studies for a specific insurer, one may find a higher than expected 
number of individuals who made no claim. 

Similarly, it is possible to have situations in which there is less than the 
expected number, or even zero, occurrences at zero. For example, if one 
is counting the number of claims from accidents resulting in a claim, the 
minimum observed value is 1. 

An adjustment of the probability at zero is easily handled for the Poisson, 
binomial, and negative binomial distributions. 


Definition 4.43 Let p, be the pf of a discrete random variable. It is a mem- 
ber of the (a,b,1) class of distributions provided that there exists constants 
a and b such that 

Pk b 


=a4a +7, k = 2,3,4,.... 
. Pk-1 k 


Note that the only difference from the (a, b,0) class is that the recursion 
begins at pı rather than po. This forces the distribution from k = 1 to k = co 
to have the same shape as the (a, b, 0) class in the sense that the probabilities 
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are the same up to a constant of proportionality because Yra Dk Can be set 
to any number in the interval (0, 1]. The remaining probability is at k = 0. 
We will distinguish between the situations in which pp = 0 and those 
where po > 0. The first subclass is called the truncated (more specifically, 
zero-truncated) distributions. The members are the zero-truncated Poisson, 
zero-truncated binomial, and zero-truncated negative binomial (and its special 
case, the zero-truncated geometric) distributions. 
The second subclass will be referred to as the zero-modified distributions 
because the probability is modified from that for the (a,b,0) class. These 
distributions can be viewed as a mixture of an (a,b,0) distribution and a 
degenerate distribution with all the probability at zero. Alternatively, they 
can be called truncated with zeros distributions since the distribution can 
be viewed as a mixture of a truncated distribution and a degenerate distri- 
bution with all the probability at zero. We now show this more formally. 
Note that all zero-truncated distributions can be considered as zero-modified 
distributions, with the particular modification being to set po = 0. 
With three types of distributions, notation can become confusing. When 
writing about discrete distributions in general, we will continue to let pk = 
Pr(N = k). When referring to a zero-truncated distribution, we will use pf, 
and when referring to a zero-modified distribution, we will use p. Once 
again, it is possible for a zero-modified distribution to be a zero-truncated 
distribution. 
Let P(z) = go prz" denote the pgf of a member of the (a,b, 0) class. 
Let PM (2) = 7%) ph z* denote the pgf of the corresponding member of the 
(a,b, 1) class; that is, 


pe! = cpr, k=1,2,3,..., 


‘and p& is an arbitrary number. Then 


Co 
PM(z) = po +) pke 
k=1 


co 

phi +05 prz” 
k=1 

= py’ +c[P(z) — po]. 


Because PM (1) = P(1) =1, 


1 = pg’ +e(1— po), 


resulting in 


1— pg" M 
c= ——— or = 1 — c(1 — po). 
=, or Po c(1 — po) 
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This relationship is necessary to ensure that the p}! sum to 1. We then have 
PM(z) = p+ 
2 ( 2 14 LZ pia. 
1—po l-p 


This is a weighted average of the pgfs of the degenerate distribution and 
the corresponding (a,b, 0) member. Furthermore, 


(4.10) 


k=1,2,.... (4.11) 


Pk = 1— po 


Let PT (z) denote the pgf of the zero-truncated distribution corresponding to 
an (a,b,0) pgf P(z). Then, by setting pj’ = 0 in (4.10) and (4.11), 


P(z) — po 
PT (z)= 
) 1—Do 
and PR 
T =, k=1,2,.... 4.12 
Pk I- po ( ) 


Then from (4.11) 
a ee (4.13) 


and 


P™ (2) = pi (1) + (1 — po“) P” (2). (4.14) 
Then the zero-modified distribution is also the weighted average of a degen- 
erate distribution and the zero-truncated member of the (a,b, 0) class. The 


following example illustrates these relationships. 


Example 4.44 Consider a negative binomial random variable with parame- 
ters 8 = 0.5 andr = 2.5. Determine the first four probabilities for this random 
variable. Then determine the corresponding probabilities for the zero-truncated 


and zero-modified (with p! = 0.6) versions. 


From Table 4.4 on Page 89 we have, for the negative binomial distribution, 


po = (1+0.5)~?° = 0.362887, 
aa 
15 3 
p = @5-D0.5)_1 
1.5 2 
The first three recursions are 
pı = 0.362887 (2 + $4) = 0.302406, 
po = 0.302406 (4 + 55) = 0.176404, 
pa = 0.176404 (4 +33) = 0.088202. 


DISCRETE DISTRIBUTIONS 87 


For the zero-truncated random variable, pf = 0 by definition. The re- 
cursions start with [from (4.12)] p? = 0.302406/(1 — 0.362887) = 0.474651. 
Then 


0.474651 (4 + 35) = 0.276880, 


11 
P 22 
= 0.276880 ($ + 44) = 0.138440. 


P 


why wh 


If the original values were all available, then the zero-truncated probabilities 
could have all been obtained by multiplying the original values by 1/(1 — 
0.362887) = 1.569580. : 
For the zero-modified random variable, pj’ = 0.6 arbitrarily. From (4.11), 
pi = (1 — 0.6)(0.302406)/(1 — 0.362887) = 0.189860. Then 


p% = 0.189860 (4 + $3) = 0.110752, 
pi = 0.110752 (3 4 $3) = 0.055376. 


In this case, each original negative binomial probability has been multiplied 
by (1 — 0.6)/(1 — 0.362887) = 0.627832. Also note that, for j = 1, py" = 
0.4p} - oO 


Although we have only discussed the zero-modified distributions of the 
(a,b,0) class, the (a,b,1) class admits additional distributions. The (a,b) 
parameter space can be expanded to admit an extension of the negative bi- 
nomial distribution to include cases where —1 < r < 0. For the (a, 0,0) class, 
r > 0 is required. By adding the additional region to the sample space, the 
“extended” truncated negative binomial (ETNB) distribution has parameter 
restrictions G > 0, r > —1, r #0. 

To show that the recursive equation 


b 
Di = Pk—1 (a+2), eH 273285 (4.15) 
with po = 0 defines a proper distribution, it is sufficient to show that for any 
value of pı, the successive values of p obtained recursively are each positive 
and that Jz; px < oo. For the ETNB, this must be done for the parameter 


space 
p 
@a = >, >0, 
1+8 ; 
B 
= (r- -1 
b (r DTF r>-l,r#0 
(see Exercise 4.44). 
When r — 0, the limiting case of the ETNB is the logarithmic distribution 
with ; 
EN ETEME E (4.16) 


ET 


eee 
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(see Exercise 4.45). The pg£ of the logarithmic distribution is 


hfi- B(z — 1] 
ae ce (4.17) 


(see Exercise 4.46). The zero-modified logarithmic distribution is created by 
assigning an arbitrary probability at zero and reducing the remaining proba- 
bilities. 

It is also interesting that the special extreme case with —1 < r < 0 and f > 
oo is a proper distribution, and is sometimes called the Sibuya distribution. 
It has pgf P(z) = 1 — (1-2) and no moments exist (see Exercise 4.47). 
Distributions with no moments are not particularly interesting for modeling 
claim numbers (unless the right tail is subsequently modified) because then 
an infinite number of claims are expected. This might be difficult to price! 


PF(z) =1- 


Example 4.45 Determine the probabilities for an ETNB distribution with 
r = —0.5 and 8B = 1. Do this both for the truncated version and for the 


modified version with ph! = 0.6 set arbitrarily. 

We have a = 1/(1 +1) = 0.5 and b= (—0.5 — 1)(1)/(1 + 1) = —0.75. From 
Appendix B we also have py = —0.5(1)/{( + 1)°8 — (1 + 1)] = 0.853553. 
Subsequent values are 


i 


ps (os — 25) (0.853553) = 0.106694, 


ll 


ps (os z 5) (0.106694) = 0.026674. 
For the modified probabilities, the truncated probabilities need to be multi- 
plied by 0.4 to produce p~ = 0.341421, pi’ = 0.042678, and ph! = 0.010670. 
A reasonable question is to ask if there is a “natural” member of the ETNB 
distribution, that is, one for which the recursion would begin with pı rather 
than po. For that to be the case, the natural value of po would have to satisfy 
pı = (0.5—0.75/1)po = —0.25p0- This would force one of the two probabilities 
to be negative and so there is no acceptable solution. It is easy to show that 


this occurs for any r < 0. a 


There are no other members of the (a, b, 1) class beyond the ones discussed 
above. A summary is given in Table 4.4. 


4.6.7 Compound frequency models 


A larger class of distributions can be created by the processes of compounding 
any two discrete distributions. The term compounding reflects the idea that 
the pgf of the new distribution P(z) is written as 


P(z) = Py[Pu(2)], (4.18) 
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Table 4.4 Members of the (a,b, 1) class 


Distribution? Po a b Parameter space 
Poisson en 0 À 
ZT Poisson 0 0 À A 4 i 
ZM Poisson Arbitrary 0 À à>0 
Binomial (1—q)™ =r (m+1)—4- O0<q<l 
. . q q q 
ZT binomial 0 Te (m+i)pa 0<as! 
ZM binomial Arbitrary ——2— (m+1)—— 0<q<1 
l—q L=@ ; 
Negative binomial (1+ 8)” see sjt 
(1+8) rsp (r "TB r>0, B>0 
ETNB 0 —— - 1)— b 
ET, (r "TSB r>—1° B>0 
ZM ETNB Arbitra: — —1)-—— 
itrary itp (r Ears r>+1,2 B>0 
Geometric 1+8) a 
(1+8) TA 0 p>0 
ZT geometric 0 —— 
i B 0 B>0 
ZM tri i —— 
geometric Arbitrary i+ 0 B>O0 
Logarithmic 0 ate. nea 
T “142 Pat 
ZM logarithmic Arbitrary in wea B>0 
1+8 1+ 


aZT = zero truncated, ZM = zero modified. 
b m : . . 
Excluding r = 0, which is the logarithmic distribution. 


Nae rl and Py,(z) are called the primary and secondary distributions, 
The compound distributions arise naturally as follows. Let N be a counting 
random variable with pgf Py(z). Let Mi, Mo,... be identically and indepen- 
dently distributed random variables with pgf Pw (z). Assuming that the M;s 
do not depend on N, the pgf of the random sum S = Mı + Mg+---+ My 
(where N = 0 implies that S = 0) is Ps(z) = Py[Pas(z)]. This is shown as 


Ps(z) =) Pr(S =k)2* =) > ` Pr(S = k|N = n) Pr(N = n)z" 
k=0 k=0n=0 
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io) co 
YO P(N =n) >> Pr(M +--+ Mn = KIN = n)z* 


n=0 k=0 

= 5 Pr(N = n) [Pu (2) 
n=0 

= Py[Pu(z)]- 


In insurance contexts, this distribution can arise naturally. If N represents the 
number of accidents arising in a portfolio of risks and {Mk; k =1,2,...,N} 
_represents the number of claims (injuries, number of cars, etc.) from the 
accidents, then S represents the total number of claims from the portfolio. 
This kind of interpretation is not necessary to justify the use of a compound 
distribution. If a compound distribution fits data well, that may be enough 
justification itself. Also, there are other motivations for these distributions, 
as presented in Section 4.6.12. 


y Example 4.46 Demonstrate that any zero-modified distribution is a com- 
pound distribution. 


Consider a primary Bernoulli distribution. It has pgf Py(z) = 1 — q + qz- 


Then consider an arbitrary secondary distribution with pgf Pj;(z). Then, 


from (4.18) we obtain 
Ps(z) = Py[Par(z)| =1-—q+ gPu(z). 


From (4.10) this is the pgf of a ZM distribution with 


1— pj" 


q = ee 


1- po 


That is, the ZM distribution has assigned arbitrary probability pj’ at zero, 
while po is the probability assigned at zero by the secondary distribution. O 


Example 4.47 Consider the case where both M and N have the Poisson 
distribution. Determine the pof of this distribution. 


This distribution is called the Poisson—Poisson or Neyman Type A distri- 
bution. Let Py(z) = e@-) and Pyy(z) = &2-). Then 


Po(z) = elet- 


When A> is a lot larger than \y—for example, 4; = 0.1 and Az = 10—the 
resulting distribution will have two local modes. O 
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The probability of exactly k claims can be written as 


Pr(S=k) = J Px(S = k|N =n) PrN =n) 


n=0 


o0 
= Ñ Pr(Mı +- +My =k|N = n) Pr(N =n) 


n=0 


foe} 
= ÑO Pr(Mı + + Mn =k)Pr(N = n). (4.19) 


n=0 ack 


Letting gn = Pr(S = n), Dn = Pr(N = n), and fn = Pr(M = n), this is 
rewritten as 


co 
Ir = >> Poff” (4.20) 
n=0 


where fj”, k = 0,1,..., is the “n-fold convolution” of the function fp, k = 
0,1,..., that is, the probability that the sum of n random variables which are 
each independent and identically distributed (i.i.d.) with probability function 
fe Will take on value k. 

When Py(z) is chosen to be a member of the (a, b, 0) class, 


b 
Pk = (a+ z) Pk-1; k=1,2,..., (4.21) 


and a simple recursive formula can be used. This formula avoids the use of 
convolutions and thus reduces the computations considerably. 


Theorem 4.48 If the primary distribution is a member of the (a, b,0) class, 
the recursive formula is 


k 
1 bj 
= — — Ya o = 1.2 ey 22 
i aR A (0+) fies nara oe 


Proof: From (4.21), 
Npn = a(n —1)pp—1 + (a + 6)Pn-1. 


Multiplying each side by [Par(z)|"~1Ph,(z) and summing over n yields 


X npr[Pu (2) Pule) a X(n — 1)pn-1[Pu (2) Puhe) 


n=l n=1 


+a +b) X pn- [Pu (2) Pu(2). 
n=1 
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Because Ps(z) = $ %o Pn[Pir(z)]”, the previous equation is 


Pile) = a npa [Par (e) Phe) + (0 +0) Y PalPar I PLC): 


n=0 n=0 z 


Therefore ; 

P5(z) = aPg(z)Pu (z) + (a + b)Ps(2)Pu (2). 
Each side can be expanded in powers of z. The coefficients of z*-1 in such 
an expansion must be the same on both sides of the equation. Hence, for 
k=1,2,... we have 


k k 
kgs = að (k—J)fi9x—j + (a+b) X ifik- 
j=0 j=0 
ko. k 
= akfogr +a) (k—3)fj9%-3 + (a+b) S ifi- 
jal j=l 
k k 
= akfog, +ak Y fjgr-j +b) fige- 
à j=l j=l 
Therefore, 
k bj 
gk = afogk + > (o + 2) L59k-3- 
Rearrangement yields (4.22). Oo 


In order to use (4.22), the starting value go is required and is given in 
Theorem 4.51 below. If the primary distribution is a member of the (a,b, 1) 
class, the proof must be modified to reflect the fact that the recursion for the 
primary distribution begins at k = 2. The result is the following. 


A Theorem 4.49 If the primary distribution is a member of the (a,b, 1) class, 
the recursive formula is 


p = Pao (et Boole + Xi- (0+ B41) Fins, k=1,2,3,.... (4.23) 
1—afo 


Proof: It is similar to the proof of Theorem 4.48 and is left to the reader. O 


Example 4.50 Develop the recursive formula for the case where the primary 
distribution is Poisson. 


In this case a = 0 and b = A, yielding the recursive form 


ie 
Ik = 5 If 9-3- 
j=l 
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The starting value is, from (4.18), 


go = Pr(S=0) = P(0) 
Py[Pas(0)] = Pw(fo) 
e Uh), (4.24) 


Il 


- Distributions of this type are called compound Poisson distributions. When 
the secondary distribution is specified, the compound distribution is called 
Poisson—X, where X is the name of the secondary distribution. E 


The method used to obtain go applies to any compound distribution. 


Theorem 4.51 For any compound distribution, go = Pn (fo), where Py (z) is 
the pgf of the primary distribution and fo is the probability that the secondary 
distribution takes on the value zero. 


Proof: See the second line of (4.24) L 


X [7 Note that the secondary distribution is not required to be in any special 
=> form. However, to keep the number of distributions manageable, secondary 
distributions will be selected from the (a, b,0) or the (a,b, 1) class. 


Example 4.52 Calculate the probabilities for the Poisson-ETNB distribution 
where à = 3 for the Poisson distribution and the ETNB distribution has 
r= —0.5 and B=1. 


From Example 4.45 the secondary probabilities are fọ = 0, fı = 0.853553, 
fo = 0.106694, and fz = 0.026674. From (4.24), go = exp[—3(1 — 0)]. = 
0.049787. For the Poisson primary distribution, a = 0 and b = 3. The 
recursive formula in (4.22) becomes 


GTR fig- 35 
g = TE 2 F i9k-3 


1 — 0(0) 
Then, 
3(1) 
n = TP 0.853553 (0.049787) = 0.127488, 
1 
2 = a ) 0.853553(0.127488) + 3C 0.106694(0.049787) = 0.179163, 
1 2 
g8 = 30) 0.853553(0.179163) + 2P) 0.106694(0.127488) 
3(3) 
+— 0.026674 (0.049787) = 0.184114. z 
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Example 4.53 Demonstrate that the Poisson-logarithmic distribution is a 
negative binomial distribution. 
The negative binomial distribution has pgf 
P(2) = [1 - (2-1). 
Suppose Py (z) is Poisson(À) and Pyz(z) is logarithmic(); then 
Py [Pu (2)] exp{A[Par(z) — 1]} 


= fp p-a) 


= oo SESH In{1 = p(z = 1)} 
= fi-se-y veer 
= [p-e], 


where r = A/In(1+8). This shows that the negative binomial distribution can 
be written as a compound Poisson distribution with a logarithmic secondary 
distribution. . Oo 


The above example shows that the “Poisson—logarithmic” distribution does 
not create a new distribution beyond the (a,b,0) and (a,b,1) classes. As a 
result, this combination of distributions is not useful to us. Another combi- 
nation which does not create a new distribution beyond the (a,b, 1) class is 
the compound geometric distribution where both the primary and secondary 
distributions are geometric. The resulting distribution is a zero-modified geo- 
metric distribution, as shown in Exercise 4.51. The following theorem shows 
that certain other combinations are also of no use in expanding the class of dis- 
tributions through compounding. Suppose Ps(z) = Py[Par(z); 0] as before. 
Now, Par(z) can always be written as 


Pulz) = fo + (1 — fo) Pir (2) (4.25) 


where P¥,(z) is the pgf of the conditional distribution over the positive range 
(in other words, the zero-truncated version). 


Theorem 4.54 Suppose the pof Pn(z;0) satisfies 
Py (z;8) = BlA(z — 1)] 


for some parameter 0 and some function B (z) which is independent of 0. That 
is, the parameter 6 and the argument z only appear in the pof as 0(z—1). There 


may be other parameters as well, and they may appear anywhere in the pgf. 
Then Ps(z) = Px[Par(z); 4] can be rewritten as 


Ps(z) = Pn[Pir(z); (1 — fo)]- 
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Proof: 


Ps(z) = Py[Par(z); 4] 
= Py[fot+(1— fo) Pi (z); 4 
B{6[fo + (1 — fo)Pir(z) — U} 
B{6(1 — fo) [Pir(z) — 1]} 
= Py[Pir(z);0(1 — fo)]. o 


This shows that adding, deleting, or modifying the probability at zero in the 
secondary distribution does not add a new distribution because it is equivalent 
to modifying the parameter 8 of the primary distribution. This means that, 
for example, a Poisson primary distribution with a Poisson, zero-truncated 
Poisson, or zero-modified Poisson secondary distribution will still lead to a 
Neyman Type A (Poisson-Poisson) distribution. 


Example 4.55 Determine the probabilities for a Poisson~-zero-modified ETNB 
distribution where the parameters are \ = 7.5, pẹ! = 0.6, r = —0.5, and 8 = 1. 


From Example 4.45 the secondary probabilities are fo = 0.6, fı = 0.341421, 
fa = 0.042678, and fs = 0.010670. From (4.24), go = exp[—7.5(1 — 0.6)] = 
0.049787. For the Poisson primary distribution, a = 0 and b = 7.5. The 
recursive formula in (4.22) becomes 


k ; : 
23 jai (7-55/k) fig- £ T.5j 


1-006) = tite 
Then, 
g = TB 0.341421 (0.049787) = 0.127487, 
p = £50) 0.341421(0.127487) + 7507) 0 042678(0.049787) = 0.179161, 
g = T50 9 341421 (0.179161) + £5) o42679(0.127487) 
7.5(3) 


+—3~0.010670(0.049787) = 0.184112. 


Except for slight rounding differences, these probabilities are the same as those 
obtained in Example 4.52. a 
= X 4.6.8 Further properties of the compound Poisson class 


Of central importance within the class of compound frequency models is the 
class of compound Poisson frequency distributions. Physical motivation for 
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this model arises from the fact that the Poisson distribution is often a good 
model to describe the number of claim-causing accidents, and the number of 
claims from an accident is often itself a random variable. This physical moti- 
vation is discussed in the previous subsection. However, there are numerous 
convenient mathematical properties enjoyed by the compound Poisson class. 
In particular, those involving recursive evaluation of the probabilities were 
also discussed in the previous subsection. In addition, there is a close con- 
nection between the compound Poisson distributions and the mixed Poisson 
frequency distributions which is discussed in more detail in Section 4.6.10. 
Here we consider some other properties of these distributions. The compound 
Poisson pgf may be expressed as 


P(z) = exp{AlQ(z) — H} (4.26) 
where Q(z) is the pgf of the secondary distribution. 


Example 4.56 Obtain the pgf for the Poisson-ETNB distribution and show 
that it looks like the pgf of a Poisson—negative binomial distribution. 


The ETNB distribution has pgf 


_fi-#e-)7-0+4)™ 
Bee 1=(1+ 6)" 


for 8 > 0, r > —1, andr #0. Then the Poisson-ETNB distribution has as 
the logarithm of its pgf 


fee —veT- 04 0 
mr) = ase 1} 


fete) 


= Mi-e- DT -1h 


where u = à/[1 — (1 + B)~"]. This defines a compound Poisson distribution 
with primary mean p and secondary pgf [1 — (2—17, which is the pgf of a 
negative binomial random variable, as long as r and hence p are positive. This 
illustrates that the probability at zero in the secondary distribution has no im- 
pact on the compound Poisson form. Also, the above calculation demonstrates 
that the Poisson—ETNB pgf P(z), with In P(z) = p{[{1-B(z- VI" - 1}, has 
parameter space {8 >0,r > —l,ur > 0}, a useful observation with respect 
to estimation and analysis of the parameters. Oo 


ll 


We can compare the skewness (third moment) of these distributions to 
develop an appreciation of the amount by which the skewness, and hence 


the tails of these distributions, can vary even when the mean and varianc 
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are fixed. From (4.26) (see Exercise 4.53) and Definition 3.4, the mean and 
second and third central moments of the compound Poisson distribution are 


Hi = LN 
la = o? = ìm, (4.27) 
Lg = Ams, 


where m, is the jth Taw mome: i i i 
j nt of the secondary distrib i 
pe aa ribution. The coefficient 
= H3 _ n3 

Vi == -3 m aye VOR a 

T A (m4)? /2 
For the Poisson—binomial distribution, with a bi 

3 t f “ci 
me h 16 O algebra (see Exercise 4.54) 


H = AMq, 
2 
o = pll+(m—1a), Gk 
= 2 2 
H3 = 3G? 2p + Z 2 (0 — p) : 
m—1 m 


Carrying out similar exercises for the negati i i 
gative binomial, Polya-Aeppli, Ney- 
man Type A, and Poisson-ETNB distributions yields pow 


Negative binomial: ee 30? —2p+2 (o? — p)? 
L 
Polya-Aeppli: PEE ont 3 (o? — u)? 
2 p 
Neyman Type A: p3 = 30° — 24 + (ae 
' H 
Poisson-ETNB: i = 30? — 2u + r+2 (o? — p}? 
r+l oy 


oe be Poisson-ETNB distribution, the range of ris -1 < r < œ, r #0 
ote that as r — 0 the secondary distribution is logarithmi eee 
t i arithmic, resul 
negative binomial distribution. ; esi eens 
Note that for fixed mean and variance the third moment only changes 
BA the coefficient in the last term for each of the five distributions. For 
the Poisson distribution, l3 = \ = 30? — 2u, and so the third term for each 
o for Hs represents the change from the Poisson distribution. For the 
l oisson-binomial distribution, if m = 1, the distribution is Poisson because 
x is equivalent to a Poisson-zero-truncated binomial as truncation at zero 
eaves only probability at 1. Another view is that from (4.28) we have 


m — 2 (m—1)?q*'m? 
m— 1 Amq 

"y 
= 307 —2p + (m — 2)(m — 1) Àm, 


H = 30°—2p+ 
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which reduces to the Poisson value for ug when m = 1. Hence, it is necessary 
that m > 2 for non-Poisson distributions to be created. Then the coefficient 
satisfies : 


4 


(eet, 


For the Poisson-ETNB, because r > —1, the coefficient satisfies 


r+2 
1 < —— < œ, 
r+l 

noting that when r = 0 this refers to the negative binomial distribution. For 
` the Neyman Type A distribution, the coefficient is exactly 1. Hence, these 
three distributions provide any desired degree of skewness greater than that 
of the Poisson distribution. Note that the Polya-Aeppli and the negative 
binomial distributions are special and limiting cases of the Poisson-ETNB 
with r = 1 and r — 0, respectively. 


Example 4.57 The data in Table 4.5 are taken from Hossack et al. [62] and 
give the distribution of the number of claims on automobile insurance policies 
in Australia. Determine an appropriate frequency model based on the skewness 
results of this section. 


The mean, variance, and third central moment are 0.1254614, 0.1299599, 
and 0.1401737, respectively. For these numbers, 


la — 30? + 2u 
(0? — p)’ /p 


From among the Poisson-binomial, negative binomial, Polya—Aeppli, Neyman 
Type A, and Poisson-ETNB distributions, only the latter is appropriate. For 
this distribution, an estimate of r can be obtained from 
7.543865 = 2+? 
r+i 
resulting in r = —0.8471851. In Example 13.14 a more formal estimation and 
selection procedure will be applied, but the conclusion will be the same. Oo 


= 7.543865. 


A very useful property of the compound Poisson class of probability distri- 
butions is the fact that it is closed under convolution. We have the following 
theorem. 


Theorem 4.58 Suppose that S; has a compound Poisson distribution with 
Poisson parameter à; and secondary distribution {an(i); n = 0,1,2,...} for 
i = 1,2,3,...,k. Suppose also that Sı, S2,..., Sk are independent random 
variables. Then S = Sı +S2+---+ S; also has a compound Poisson distribu- 
tion with Poisson parameter \ = Ay +2 +- +Ak and secondary distribution 
{an; n =0,1,2,...}, where qn = [Argn(1) + d2gn(2) +--+ + Angn(k)]/A- 
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Table 4.5 Hossack et al. data 


No. of claims Observed frequency 


565,664 
68,714 
5,177 
365 
24 
6 
+ GP. o 


aur WN rH © 


Proof: Let Qi(z) = Ðo 4n(i)z” for i = 1,2,...,k. Then 5; has pgf 
Ps,(z) = E(z5*) = exp{;[Qi(z) — 1]}. Because the S;s are independent, 
S has pgf 


Ps(z) = I Ps 
= Il exp{\;[Q:i(z) — 1]} 
i=l " k 
= exp D AQ) = Zal 


= pR- 


where A = Fea à; and Q(z) = pai NiQi(z)/A. The result follows by the 
uniqueness of the generating function. ð 


One main advantage of this result is computational. If we are interested 
in the sum of independent compound Poisson random variables, then we do 
not need to compute the distribution of each compound Poisson random vari- 
able separately (i.e., recursively using Example 4.50) because Theorem 4.58 
implies that a single application of the compound Poisson recursive formula 
in Example 4.50 will suffice. The following example illustrates this idea. 


Example 4.59 Suppose that k = 2 and Sı has a compound Poisson distri- 
bution with 1 = 2 and secondary distribution qi(1) = 0.2, q2(1) = 0.7, and 
q3(1) = 0.1. Also, S2 (independent of Sı) has a compound Poisson distrib- 
ution with ào = 3 and secondary distribution q2(2) = 0.25, q3(2) = 0.6, and 
qa(2) = 0.15. Determine the distribution of S = S1 + So. 


We have À = Ai + àa = 2 + 3 = 5. Then 


qı = 0.4(0.2) + 0.6(0) = 0.08, 
qa = 0.4(0.7) + 0.6(0.25) = 0.43, 
g3 = 0.4(0.1) +.0.6(0.6) = 0.40, 


qa = 0.4(0) + 0.6(0.15) = 0.09. 
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Thus, S has a compound Poisson distribution with Poisson parameter A=5 
and secondary distribution qı = 0.08, q2 = 0.43,q3 = 0.40, and q4 = 0.09. 
Numerical values of the distribution of S may be obtained using the recursive 
formula 


5 T 
Pr(S =2)=— ) ngn Pr(S =z- n), r=1,2,..., 
1(5 = 2) = 
beginning with Pr(S = 0) =e~°. o 


In various situations the convolution of negative binomial distributions is 
of interest. The following example indicates how this distribution may be 
evaluated. 


Example 4.60 (Convolutions of negative binomial distributions). Suppose 
that N; has a negative binomial distribution with parameters Ti and p; for 
i = 1,2,...,k and that Ni, No,..., Ng are independent. Determine the dis- 


tribution of N = Ny + No+---+ Nx. 


The pef of N; is Py,(z) = [1 — 6,(z— 1)|-% and that of N is Py(z) = 


K = [Ë — — 1p" . = B fori = 1,2,...,k, then 
i= Py; (z) = i= [1 B,{z 1) + If b; B 1“) 3 , : 
ae a ‘ e — Pj atrata), and N has a negative binomial distrib- 

ution with parameters r = T1 +T2 +: + Tk and £. 
If not all the @;s are identical, however, we may proceed as follows. From 
Example 4.53, 


Py,(z) = [1- Bz = DP! = lOO 
where A; = rj ln(1+ 6,) and 


geji MESA HI = 5 gate” 


In(i + 6;) 

with 

So amA) 

But Theorem 4.58 implies that N = Ny + No+---+ Np has a compound 
Poisson distribution with Poisson parameter 


al aes 


k 
A= Sorin. + 4) 


i=l 
and secondary distribution 


f k ào 
In = L zali) 


= 


Sein ri[6;/(1 + 2)” n=1,2,3,.... 
mo Ti ln(1 + 8 ;) l 
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The distribution of N may be computed recursively using the formula 


Pr(N =n) =") “kg, Pr(W =n- 4), n=1,2,..., 
k=l 


beginning with Pr(N = 0) = e~* = J[_,(1+8,)~* and with À and qn as 
given above. Oo 


It is not hard to see that Theorem 4.58 is a generalization of Theorem 
4.37, which may be recovered with qı(i) = 1 for i = 1,2,...,k. Similarly, 
the decomposition result of Theorem 4.38 may also be extended to compound 
Poisson random variables, where the decomposition is on the basis of the 
region of support of the secondary distribution. See Panjer and Willmot 
[106], Sec. 6.4 or Karlin and Taylor [72], Sec. 16.9 for further details. 


4.6.9 Mixed frequency models 


Many compound distributions can arise in a way that is very different from 
compounding. In this section, we examine mixture distributions by treating 
one or more parameters as being “random” in some sense. This section ex- 
pands on the ideas discussed in Section 4.6.3 in connection with the gamma 
mixture of the Poisson distribution being negative binomial. 

We assume that the parameter is distributed over the population under 
consideration (the collective) and that the sampling scheme that generates 
our data has two stages. First, a value of the parameter is selected. Then, 
given that parameter value, an observation is generated using that parameter 
value. 

In automobile insurance, for example, classification schemes attempt to put 
individuals into (relatively) homogeneous groups for the purpose of pricing. 
Variables used to develop the classification scheme might include age, expe- 
rience, a history of violations, accident history, and other variables. Because 
there will always be some residual variation in accident risk within each class, 
mixed distributions provide a framework for modeling this heterogeneity. 

Let P(z|@) denote the pgf of the number of events (e.g., claims) if the risk 
parameter is known to be 0. The parameter, 0, might be the Poisson mean, 
for example, in which case the measurement of risk is the expected number 
of events in a fixed time period. 

Let U(@) = Pr(© < 0) be the cdf of ©, where © is the risk parameter, 
which is viewed as a random variable. Then U(6) represents the probability 
that, when a value of © is selected (e.g., a driver is included in our sample), 
the value of the risk parameter does not exceed 0. Let u(0) be the pf or pdf 
of ©. Then 


P(z) = | P(2|0)u(0) d9 or P(z) = Y` P(2|0;)u(0;) (4.29) 
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is the unconditional pgf of the number of events (where the formula selected 
depends on whether © is discrete or continuous’). The corresponding proba- 
bilities are denoted by - 


Then the mixed distribution has probabilities 


t/m _LI(a 
Pk = f (méa ~q)™ Tone 70 —q)* dq 
T(a +b) T(m +1) (a + k)0(b+m-—k) 
TOOLE (m k+) atb m) 


Ce 
k -k 
a A NE SG ae 
—a—b ts} 
ey : 


Example 4.63 Determine the pf for a mized negative binomial distribution 


with mizing on the parameter p = (1+ PY}. Let p have a beta distribution. 
The mized distribution is called the generalized Waring. 


Pk = OZON Or pk = X pr(0;)u(0;). (4.30) 


The mixing distribution denoted by U (0) may be of the discrete or contin- 
uous type or even a combination of discrete and continuous types. Discrete 
_ mixtures are mixtures of distributions when the mixing function is of the 
discrete type. Similarly for continuous mixtures. This phenomenon was 
introduced for continuous mixtures of severity distributions in Section 4.4.5 
and for finite discrete mixtures in Section 4.2.3. 

Tt should be noted that the mixing distribution is unobservable because the 
data are drawn from the mixed distribution. 


Arguing as in Example 4.62 we have 
Example 4.61 Demonstrate that the zero-modified distributions may be cre- 


ated by using a two-point misture. r(r+k) T(a+b) 


1 
J= a+r—1 — p\btk—1 
K P(r)(E + 1) F(ab) I E ea ee 
= T(r+k) T(a+6)T(at+r)T(o+k) est 
EDN areor PT Oba: 
When b = 1, this distribution is called the Waring distribution. When r = 
b= 1, it is termed the Yule distribution. . o 


Suppose 
P(z) =p: 1+ (1 —p)Pa(z). 
This is a (discrete) two-point mixture of a degenerate distribution that places 


all probability at zero and a distribution with pgf P2(z). From (4.25) this is 
also a compound Bernoulli distribution. Oo 


4.6.10 Poisson mixtures 


If we let px (8) in (4.30) have the Poisson distribution, this leads to a class of 
distributions with useful properties. A simple example of a Poisson mixture 
is the two-point mixture. 


Many mixed models can be constructed beginning with a simple distribu- 
tion. Two examples are given here. 


Example 4.64 Suppose drivers can be classified as “good drivers” and “b 

: ; : ; : E a ad 
Example 4.62 Determine the pf for a mized binomial with a beta mizing dis- : drivers,” each group with its own Poisson A Determine the pf for 
tribution. This distribution is called binomial-beta, negative hypergeometric, : this model and fit it to the data from Example 12.56. This model a its 


or Polya-Eggenberger. application to the data set are from Trébliger [130]. 


The beta distribution has pdf From (2.30) the pf is 


ef e7 2 E 
Pk =p, +(l-p) a 


: The maximum likelihood estimates? were calculated by Trébliger to be 
p = 0.94, A, = 0.11 and Ay = 0.70. This means that about 94% of drivers 


ulg) = SOLS ag ty a>0,b>0. 


5We could have written the more general P(z) = f P(z|@)dU(@), which would include 


situations where © has a distribution that is partly continuous and partly discrete. §Maximum likelihood estimation is discussed in Section 12.2 


104 CLASSIFYING AND CREATING DISTRIBUTIONS 


were “good” with a risk of Ay = 0.11 expected accidents per year and 6% 
were “bad” with a risk of Ag = 0.70 expected accidents per year. Note that 
it is not possible to return to the data set and identify which were the bad 
drivers. O 


This example illustrates two important points about finite mixtures. First, 
the model is probably oversimplified in the sense that risks (e.g., drivers) 
probably exhibit a continuum of risk levels rather than just two. The second 
point is that finite mixture models have. a lot of parameters to be estimated. 
- The simple two-point Poisson mixture above has three parameters. Increasing 
the number of distributions in the mixture to r will then involve r — 1 mixing 
parameters in addition to the total number of parameters in the r component 
distributions. As a result of this, continuous mixtures are frequently preferred. 

The class of mixed Poisson distributions has some interesting properties 
that will be developed here. 

Let P(z) be the pgf of a mixed Poisson distribution with arbitrary mixing 
distribution U(@). Then (with formulas given for the continuous case), by 
introducing a scale parameter A, we have 


P(z) = Í e—Du(6) do = / [ee] uo dð 
E [ee] } = Mə [A(z — 1), (4.31) 


where Mo(z) is the mgf of the mixing distribution. 

Therefore, P’(z) = \M6[\(z—1)] and with z = 1 we obtain E(N) = AE(6), 
where N has the mixed Poisson distribution. Also, P’(z) = MMEIA(z — 1] 
implying that E[N(N — 1)] = XPE(O?) and therefore 


il 


> E(N) 


and thus for mixed Poisson distributions the variance is always greater than 
the mean. 

Douglas [29] proves that for any mixed Poisson distribution the mixing 
distribution is unique. This means that two different mixing distributions 
cannot lead to the same mixed Poisson distribution. This allows us to identify 
the mixing distribution in some cases. 

There is also an important connection between mixed Poisson distributions 
and compound Poisson distributions. 


Var(N) = EIN(N -1)] + E(N) - EW)? | 
= XE(O?) + E(N) — X E(0)? : 
X? Var(©) + E(N) | 


Fai) SRN isai PIM NUD UES TT 


Definition 4.65 A distribution is said to be infinitely divisible if for all 
values of n = 1,2,3,... its characteristic function p(z) can be written as 


(2) = [yn e), 
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where p(z) is the characteristic function of some random variable. 


In other words, taking the (1/n)th power of the characteristic function still 
results in a characteristic function. The characteristic function is defined as 
follows. 


Definition 4.66 The characteristic function of a random variable X is 
yy (z) = E(e**) = E(coszX +isinzX), 
where i = y—1. 


In Definition 4.65, “characteristic function” could have been replaced by 
“moment generating function” or “probability generating function,” or some 
other transform. That is, if the definition is satisfied for one of these trans- 
forms, it will be satisfied for all others which exist for the particular random 
variable. We choose the characteristic function because it exists for all dis- 
tributions while the moment generating function does not exist for some dis- 
tributions with heavy tails. Because many earlier results involved probability 
generating functions, it is useful to note the relationship between it and the 
characteristic function. 


Theorem 4.67 If the probability generating function exists for a random 
variable X, then Px(z) = p(—ilnz) and yx(z) = P(e’*). 


Proof: 
Px(z) = E(z*) = E(e***) = Ble 9%] = py(—ilnz) 


and 


px (2) = E(e**) = El(e’*)*] = Px(e*). o 


The following distributions, among others, are infinitely divisible: normal, 
gamma, Poisson, negative binomial. The binomial distribution is not infi- 
nitely divisible because the exponent m in its pgf must take on integer values. 
Dividing m by n = 1,2,3,... will result in nonintegral values. In fact, no 
distributions with a finite range of support (the range over which positive 
probabilities exist) can be infinitely divisible. Now to the important result. 


Theorem 4.68 Suppose P(z) is a mized Poisson pgf with an infinitely divis- 
ible mizing distribution. Then P(z) is also a compound Poisson pgf and may 
be expressed as 

P(z) = el Pa(z)—1] 


where Po(z) is a pof. If one adopts the convention that Pa(0) = 0, then P2(z) 
is unique. i 
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A proof can be found in Feller [35], Ch. 12. If one chooses any infinitely 
divisible mixing distribution, the corresponding mixed Poisson distribution 
can be equivalently described as a compound Poisson distribution. For some 
distributions, this is a distinct advantage when carrying out numerical work 
because the recursive formula (4.22) can be used in evaluating the probabilities 
once the secondary distribution is identified. For most cases, this identification 
is easily carried out. A second advantage is that, because the same distribution 
can be motivated in two different ways, a specific explanation is not required 
in order to use it. Conversely, the fact that one of these models fits well does 
not imply that it is the result of mixing or compounding. For example, the 
fact that claims follow a negative binomial distribution does not imply that 
individuals have the Poisson distribution and the Poisson parameter has a 
gamma distribution. 


Example 4.69 Use the above results and (4.31) to demonstrate that a gamma 
mizture of Poisson variables is negative binomial. 


If the mixing distribution is gamma, it has the following moment generating 
function (as derived in Example 3.15 and where £ plays the role of 1/6): 


Melt) = GE) , Brod, a>0,t<8. 
It is clearly infinitely divisible because [Mo O ” is the mgf of a gamma 
distribution with parameters a/n and 8. Then the pgf of the mixed Poisson 
distribution is 


sa 


ro (acaba) = bpd 


which is the form of the pgf of the negative binomial distribution where the 
negative binomial parameter r is equal to a and the parameter 8 is equal to 


MB. m 


Tt was shown in Example 4.53 that a compound Poisson distribution with a 
logarithmic secondary distribution is a negative binomial distribution. There- 
fore the theorem holds true for this case. It is not difficult to see that, if u(8) 
is the pf for any discrete random variable with pgf Poə(z), then the pgf of the 
mixed Poisson distribution is Po Cemeea a compound distribution with a 
Poisson secondary distribution. 


Example 4.70 Demonstrate that the Neyman Type A distribution can be 
obtained by mizing. 


If in (4.31) the mixing distribution has pgf 


Po(z) = #9, 
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then the mixed Poisson distribution has pgf 
P(z) = exp{u[e*@- — 1}, 


the pgf of a compound Poisson with a Poisson secondary distribution, that is, 
the Neyman Type A distribution. o 


A further interesting result obtained by Holgate [60] is that, if a mixing 
distribution is absolutely continuous and unimodal, then the resulting mixed 
Poisson distribution is also unimodal. Multimodality can occur when discrete 
mixing functions are used. For example, the Neyman Type A distribution can 
have more than one mode. The reader should try this calculation for various 
combinations of the two parameters. 

Most continuous distributions in this book involve a scale parameter. This 
means that scale changes to distributions do not cause a change in the form 
of the distribution, only in the value of its scale parameter. For the mixed 
Poisson distribution, with pgf (4.31), any change in A is equivalent to a change 
in the scale parameter of the mixing distribution. Hence, it may be convenient 
to simply set À = 1 where a mixing distribution with a scale parameter is used. 


Example 4.71 Show that a mized Poisson with an inverse Gaussian mixing 
distribution is the same as a Poisson-ETNB distribution with r = —0.5. 


The inverse Gaussian distribution is described in Appendix A. It has pdf 


g 1/2 = 2 
1) = (525) ee (S54) |. =>0 


which is conveniently rewritten as 


KO Gages? [ape | #9. 


where § = p?/0 . The mef of this distribution is (see Exercise 3.24) 
M(t) = exp 4 —£[(1 — 26¢)'/? 
M(t) = exp gltt 26t)" —1] >. 
Hence, the inverse Gaussian distribution is infinitely divisible ([/(t)|!/” is 


the mgf ofan inverse Gaussian distribution with u replaced by u/n). From 
(4.31) with \ = 1, the pgf of the mixed distribution is 


P(z) = exp ( gil + 20(1 — z)]/? 13). 


By setting 


A= S[(1 +28)? - 1] 


K 
B 
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Table 4.6 Pairs of compound and mixed Poisson distributions 


Compound secondary Mixing 
Name distribution distribution 
Negative binomial logarithmic gamma 
Neyman-A Poisson Poisson 


Poisson—inverse Gaussian ETNB (r = —0.5) inverse Gaussian 


_ and 


_ (1 -26(z- 1)? - (1+ 28)? 
P2(z) = = Gee 


we see that 
P(z) = exp{A[P2(z) — 1}; 
where P2(z) is the pgf of the extended truncated negative binomial distribu- 


tion with r = —4. 
Hence, the Poisson-inverse Gaussian distribution is a compound Poisson 


distribution with an ETNB (r = —4) secondary distribution. oO 


The relationships between mixed and compound Poisson distributions are 
given in Table 4.6. 

In this chapter, we focused on distributions that are easily handled com- 
putationally. Although many other discrete distributions are available, we 
believe that those discussed form a sufficiently rich class for most problems. 


4.6.11 Effect of exposure on frequency 


Assume that the current portfolio consists of n entities, each of which could 
produce claims. Let N; be the number of claims produced by the jth entity. 
Then N = Ni +--+ Nn. If we assume that the N; are independent and 
identically distributed, then 


Py (z) = [Pn, (z)]”. 


Now suppose the portfolio is expected to expand to n* entities with fre- 
quency N*. Then 


Py-(z) = [Py,(2)]”" = [Pu (2). 
Thus, if N is infinitely divisible, the distribution of N* will have the same 


form as that of N, but with modified parameters. 


Example 4.72 It has been determined from past studies that the number of 
workers compensation claims for a group of 300 employees in a certain occu- 
pation class has the negative binomial distribution with B = 0.3 and r = 10. 
Determine the frequency distribution for a group of 500 such individuals. 
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The pgf of N* is 


Py- (z) = [Py (z)]°00/80° = {i pe 0.3(z D 1)]7 1403500/300 
[1 — 0.3(z — 1)]716-67, 
which is negative binomial with 6 = 0.3 and r = 16.67. o 


For the (a,b, 0) class, all members except the binomial have this property. 
For the (a, b,1) class, none of the members do. For compound distributions, 
it is the primary distribution that must be infinitely divisible. In particular, 
compound Poisson and compound negative binomial (including the geomet- 
ric) distributions will be preserved under an increase in exposure. Earlier, 
some reasons were given to support the use of zero-modified distributions. If 
exposure adjustments are anticipated, it may be better to choose a compound 
model, even if the fit is not quite as good. It should be noted that compound 
models have the ability to place large amounts of probability at zero. 


X 4.6.12 An inventory of discrete distributions 


We have introduced the simple (a, b, 0) class, generalized to the (a,b, 1) class, 
and then used compounding and mixing to create a larger class of distribu- 
tions. Calculation of the probabilities of these distributions can be carried 
out by using simple recursive procedures. In this section we note that there 
are relationships among the various distributions similar to those of Section 
4.5.2. The specific relationships are given in Table 4.7. 

It is clear from earlier developments that members of the (a,b,0) class 
are special cases of members of the (a,b,1) class and that zero-truncated 
distributions are special cases of zero-modified distributions. The limiting 
cases are best discovered through the probability generating function, as was 
done on page 79 where the Poisson distribution is shown to be a limiting case 
of the negative binomial distribution. 

We have not listed compound distributions where the primary distribution 
is one of the two parameter models such as the negative binomial or Poisson— 
inverse Gaussian. This was done because these distributions are often them- 
selves compound Poisson distributions and, as such, are generalizations of 
distributions already presented. This collection forms a particularly rich set 
of distributions in terms of shape. However, many other distributions are 
also possible. Many others are discussed in Johnson, Kotz, and Kemp [69], 
Douglas [29], and Panjer and Willmot [106]. 


4.6.13 Exercises 


4.40 For each of the data sets in Exercises 12.96 and 12.98 on page 400 
calculate values similar to those in Table 4.2. For each, determine the most 
appropriate model from the (a, b, 0) class. 
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Table 4.7 Relationships among discrete distributions 


ZT geometric 
ZM geometric 


ZM geometric 
ZT negative binomial 
ZM negative binomial 


Distribution Is a special case of Is a limiting case of 

Poisson ZM Poisson negative binomial 
Poisson—binomial 
Poisson—inv. Gaussian 
Polya—Aeppli* 
Neyman-A? 

ZT Poisson ZM Poisson ZT negative binomial 

. ZM Poisson ZM negative binomial 
geometric negative binomial, geometric—Poisson 


logarithmic ZT negative binomial 
ZM logarithmic ZM negative binomial 
binomial ZM binomial 


ZM negative binomial, 
Poisson—ETNB 
Poisson—inverse Gaussian Poisson-ETNB 
Polya—Aeppli Poisson—ETNB 
Neyman-A 


negative binomial 


Poisson—ETNB 


a Also called Poisson-geometric. 
b Also called Poisson-Poisson. 


4.41 Calculate Pr(N = 0), Pr(N = 1), and Pr(N = 2) for each of the 
following distributions. 

(a) Poisson(A = 4) 

(b) Geometric(@ = 4) 

(c) Negative binomial(r = 2, 8 = 2) 

(d) Binomial(m = 8,q = 0.5) 

(e) Logarithmic(@ = 4) 

(£) ETNB(r = —0.5, 6 = 4) 

(g) Poisson—inverse Gaussian(A = 2, 8 = 4) 

(h) Zero-modified geometric(ph! = 0.5, 8 = 4) 

(i) Poisson—Poisson(Neyman Type A)(Aprimary = 4;Asecondary = 1) 

(j) Poisson-ETNB(A = 4,7 = 2,8 = 0.5) 


(k) Poisson-zero-modified geometric distribution(A = 8, pet = 0.5,r = 
2,8 = 0.5) 
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4.42 The moment generating function (mgf) for discrete variables is 
defined as 


My(z)=E (e) = So pre**. 
k=0 


Demonstrate that Py(z) = My(Inz). Use the fact that E(N*) = M Me (0) to 
show that P’(1) = E(N) and P”(1) = E[N(N — 1)]. 


4.43 Use your knowledge of the permissible ranges for the parameters of the 
Poisson, negative binomial, and binomial to determine all possible values of 
a and b for these members of the (a, b,0) class. Because these are the only 
members of the class, all other pairs must not lead to a legitimate probability 
distribution (nonnegative values that sum to 1). Show that the pair a = —1 
and b = 1.5 (which is not on the list of possible values) does not lead to a 
legitimate distribution. 


4.44 Show that for the negative binomial distribution with any 6 > 0 and 
r > —1, but r 4 0, the successive values of p; given by (4.15) are, for any p1, 
positive and J> g] Pk < 00. 


4.45 Show that when, in the zero-truncated negative binomial distribution, 
r — 0 the pf is as given in (4.16). 


4.46 Show that the pgf of the logarithmic distribution is as given in (4.17). 


4.47 Show that for the Sibuya distribution, which is the ETNB distribution 
with —1 <r < 0 and 8 — o, the mean does not exist (that is, the sum which 
defines the mean does not converge). Because this random variable takes on 
nonnegative values, this also shows that no other positive moments exist. 


4.48 A frequency model that has not been mentioned to this point is the zeta 
distribution. It is a zero-truncated distribution with pf = k-@*) /¢(p + 
1), k=1,2,...,p > 0. The denominator is the zeta function, which must be 
evaluated numerically as C(p + 1) = Jg; kCt). The zero-modified zeta 
distribution can be formed in the usual way. More information can be found 
in Luong and Doray [88]. Verify that the zeta distribution is not a member 
of the (a,b, 1) class. 


4.49 Do all the members of the (a, b, 0) class satisfy the condition of Theorem 
4.54? For those that do, identify the parameter (or function of its parameters) 
that plays the role of 0 in the theorem. 


4.50 For i = 1,...,n let S; have independent compound Poisson frequency 
distributions with Poisson parameter à; and a secondary distribution with pgf 
P,(z). Note that all n of the variables have the same secondary distribution. 
Determine the distribution of S = S1 +-+- + Sn- 


i 
j 
| 
i 
| 
i 
i 
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4.51 Show that the following three distributions are identical: (1) geometric— (d) Suppose that N has the Sibuya distribution with pgf P(z) = 1 — 


geometric, (2) Bernoulli-geometric, (3) zero-modified geometric. That is, for (1—2), —1<r<0. Prove that 
any one of the distributions with arbitrary parameters, show that there is a 
member of the other two distribution types that has the same pf or pgf. Poe (=r) (n +r) EE i 
= EOT = 1,2 Ja i 
$ A 5 r å è P : nit (1 +r 7 2439) : 
4.52 Show that the binomial-geometric and negative binomial-geometric (with ( ) i 
negative binomial parameter r a positive integer) distributions are identical. and that | 


n+r 
n= ( bi ) n=0,1,2,.... 


(e) Suppose that N has the mixed Poisson distribution with 


œ ( 9)” —r6 
m= f ODE IUO), n=0,1,2,..., 
0 n: 


4.53 Show that, for any pgf, P (1) = E[N(N - 1) -- (N-k + 1)] provided 
the expectation exists. Here P(*)(z) indicates the kth derivative. Use this 
result to confirm the three moments as given in (4.27). 


4.54 Verify the three moments as given in (4.28). 


4.55 Show that the negative binomial—Poisson compound distribution is the 
same as a mixed Poisson distribution with a negative binomial mixing distri- 


bution where U (0) is a cumulative distribution function. Prove that 


| co na — Al 
gaa (A0)"e 
0 


4.56 Fori=1,...,n let N; have a mixed Poisson distribution with parameter 
n! 


A. Let the mixing distribution for N; have pgf P;(z). Show that N=N + 
.- -+ Nn has a mixed Poisson distribution and determine the pgf of the mixing 
distribution. 


[1 — U(0)] d9, n=0,1,2,.... 


4.60 Consider the mixed Poisson distribution 


4.57 Let N have a Poisson distribution with (given that O = 0) parameter 
A8. Let the distribution of the random variable © have a scale parameter. 
Show that the mixed distribution does not depend on the value of A. 


1 AGr e>? 
pa=Pr(N=n)= f | E E A 


0 n! 


0<@<1, k=1,2,.... 


where U(0) = 1—(1—9)*, 


4.58 Let N have a Poisson distribution with (given that © = @) parameter 
9. Let the distribution of the random variable © have pdf u(8) = a? (a + 
1)-1(0 + 1)e7°®, 9 > 0. Determine the pf of the mixed distribution. Also, 
show that the mixed distribution is also a compound distribution. 


(a) Prove that 


SAT (mm +k — 1)! 


Dn = ke imap? Tte 
4.59 For the discrete counting random variable N with probabilities pn = fo (m+ k +n)! 
Pr(N = n); n = 0,1,2,..., let an = Pr(N > n) = peng Pes n = 
0,1,2,.... (b) Using Exercise 4.59 prove that 
(a) Demonstrate that E(N) = $ r=0 4n- o ymtntl m + k)! 


(b) Demonstrate that A(z) = [72.9 @nz” and P(z) = Soo Pnz” are mim+k+n+1)! 


related by A(z) = [1 — P(z)] /(1 — z). What happens as z — 1? 
(c) Suppose that N has the negative binomial distribution 


_[rn+r-l1 1 j BN” E 
a= i \() (5) , n=0,1,2,..., 


where r is a positive integer. Prove that 


Pr(N > n) =e 5 
m=0 


amen eee ET TPR TR PRU SAT ONS ITI 


(c) When k = 1, prove that 


_ 1-0 A" e->/m! 
= Á , 


É 
$ 


Pn 


4.61 Consider the mixed Poisson distribution 


m=) (44) G5) n=0,1,2,.... 


2 


co na~ àb 
Pn -| ON WS Oaks 
0 ni 
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where the pdf u() is that of the positive stable distribution (see, for example, 
Feller [36], pp. 448, 583) given by 


u(6) = 2y See yg sin(kar), 8 > 0, 
1 k=l . 


where 0 < a < 1. The Laplace transform is [[° e7™®u(0)dð = exp(—s*), s > 
0. Prove that {pn; n =0,1,...} is a compound Poisson distribution with 
Sibuya secondary distribution (this mixed Poisson distribution is sometimes 
called a discrete stable distribution). 


4.62 Consider a mixed Poisson distribution with a reciprocal inverse Gaussian 
distribution as the mixing distribution. 


(a) Use Exercise 4.36 to show that this distribution is the convolution 
of a negative binomial distribution and a Poisson—ETNB distribu- 
tion with r = —} (i.e., a Poisson—inverse Gaussian distribution). 


(b) Show that the mixed Poisson distribution in (a) is a compound 
Poisson distribution and identify the secondary distribution. 


Frequency and severity 
with coverage 
modifications 


5.1 INTRODUCTION 


We have seen a variety of examples that involve functions of random variables. 
In this chapter we relate those functions to insurance applications. Through- 
out this chapter we assume that all random variables have support on all or 
a subset of the nonnegative real numbers. At times in this chapter and later 
in the text we will need to distinguish between a random variable that mea- 
sures the payment per loss (so zero is a possibility, taking place when there 
is a loss without a payment) and a variable that measures the payment per 
payment (the random variable is not defined when there is no payment). For 
notation, a per-loss variable will be denoted Y” and a per-payment variable 
will be noted YP. When the distinction is not material (for example, setting 
a maximum payment does not create a difference), the superscript will be left 
off. 


5.2 DEDUCTIBLES 


Insurance policies are often sold with a per-loss deductible of d. When the 
loss, x, is at or below d, the insurance pays nothing. When the loss is above 


Loss Models: From Data to Decisions, Second Edition. 


By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 
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d, the insurance pays x — d. In the language of Chapter 3 such a deductible 
can be defined as follows. 


Definition 5.1 An ordinary deductible modifies a random variable into 
either the excess loss or left censored and shifted variable (see Definition 3.6). 
The difference depends on whether the result of applying the deductible is to 
be per payment or per loss, respectively. 


This concept has already been introduced along with formulas for deter- 
mining its moments. The per-payment variable is 


yP = undefined, X <d, 
~ | X—d, X >d, 


while the per-loss variable is 


yt = 0, X <d, 
~) X-d, X>d. 


_ Note that the per-payment variable Y? = Y*|Y" > 0. That is, the per- 
payment variable is the per-loss variable conditioned on the loss being positive. 
For the excess loss/per-payment variable, the density function is 


fye(y) = EE, y>0, (5.1) 


noting that for a discrete distribution the density function need only be re- 
placed by the probability function. Other key functions are 


Spey) = Ser, 

Fx(y + d) — Fx(d) 
Fy p(y) AN I -Ri ; 
hye(y) = ours = hx(y +a). 


Note that as a per-payment variable the excess loss variable places no proba- 
bility at 0. 

The left censored and shifted variable has discrete probability at zero of 
Fx(d) representing the probability that a payment of zero is made because 
the loss did not exceed d. Above zero, the density function is 


nytt nena emma sce ieee ee eT HR TT 


fye(y) = fx(y + d), y>0, (5.2) 
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while the other key functions are’ (for y > 0) 
Syr(y) = Sx(y +d), | 

Fyr(y) = Fx(y+d). | 

It is important to recognize that when counting claims on a per payment, 
changing the deductible will change the frequency with which payments are 


made (while the frequency of losses will be unchanged). The nature of these 
changes will be discussed in Section 5.6. 


Example 5.2 Determine similar quantities for a Pareto distribution with 
a =3 and 6 = 2000 for an ordinary deductible of 500. 


Using the above formulas, for the excess loss variable, 


3(2,000)3(2,000 + y+500)~* _3(2,500)* 


fye) = — 3 000)(2,000 +500) $ (500+ y) 
3 
Syr (y) = (r) ; 
3 
Pet) = 1- (aaea) ° 
hyr (y) a +y 


Note that this is a Pareto distribution with a = 3 and 8 = 2,500. For the left 
censored and shifted variable, 


0.488, pa 0512 y=0, 
frea) = 3(2,000)? Sy: (y) = (2,000) 
ao, en AON a, He 
l G50 Fy 250 Fyi # 
0.488, y=0, undefined, y= 0, 
Fyi(y) = _ _ (2,000)? hyr (y) = 3 ce 
1- 0y? 7 2504y ” 


Figure 5.1 contains a plot of the densities. The modified densities are 
created as follows. For the excess loss variable, take the portion of the original 
density from 500 and above. Then shift it to start at zero and multiply it by 
a constant so that the area under it is still 1. The left censored and shifted 
variable also takes the original density function above 500 and shifts it to the 
origin, but then leaves it alone. The remaining probability is concentrated at 
zero, rather than spread out. o 


lThe hazard rate function is not presented because it is not defined at zero, making it 
of limited value. Note that for the excess loss variable the hazard rate function is simply 
shifted. 
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0.488 probability at zero for the 
left censored/shifted distribution 


== Original 
-— Excess loss 
—— Left censored/shifted 


800 1000 


Fig. 5.1 Densities for Example 5.2. 


An alternative to the ordinary deductible is the franchise deductible. This 
deductible differs from the ordinary deductible in that, when the loss exceeds 
the deductible, the loss is paid in full. One example is in disability insurance 
where, for example, if a disability lasts seven or fewer days, no benefits are 
paid. However, if the disability lasts more than seven days, daily benefits are 
paid retroactively to the onset of the disability. 


Definition 5.3 A franchise deductible modifies the ordinary deductible by 
adding the deductible when there is a positive amount paid. 


The terms left censored and shifted and excess loss are not used here. 
Because this modification is unique to insurance applications, we will use 
per-payment and per-loss terminology. The per-loss variable is 


L 0, X <d, 
x =f K Koa 


while the per-payment variable is 


yP = undefined, X <d, 
~ | X, X>d. 


Note that, as usual, the per-payment variable is a conditional random variable. 


The quantities derived above for the ordinary deductible are now 


fyz) { fx), y >d, Svep { Say). y >d, 
Fxy(d), O<y<d, _f0, 0<y<d, 
na { Fx(y), y>d, hye (y) = { hx(y), y>4, 
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for the per loss variable and 
1, O<y<d, 
fro) = EE, y> 4, Sew) = i SW) g 
P Sx(d) ? $ 

0, 0<y<d, 

Fye(y) = Fx(y) — Fx(d) 
ekg 97% 

0 O<y<d 

h ? 3 
ve(y) { hx(y), y>d 


for the per-payment variable. 
Example 5.4 Repeat Example 5.2 for a franchise deductible. 


Using the above formulas for the per-payment variable, for y > 500, 


E 3(2,000)3(2,000+y)~4 _  3(2,500)8 
K (2,000)3(2,000 + 500) -3 (2,000 + y)#" 
2500 \’ 
Syr (y) = (a) ; 
2,500 \° 
Fyr(y) = (stares) A 
3 
hyr (y) 2000 4y 


For the per-loss variable, 


0.488, y=, 
fr) = 3(2,000)3 
(@,000+ ye? ¥7 50 
0.512, 0<y< 500, 
Syz(y) = (2,000) 
Goog ¥7 N 
0.488, 0 < y < 500, 
Fyz (y) = (2,000)? 
@,000+ ye 7 5 
: i 0, 0 < y < 500, 
hyt(y) = 
, y > 500. 
200 +y » 


Expected costs for the two types of deductible may also be calculated. 
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Theorem 5.5 For an ordinary deductible, the expected cost per loss is E(X)- 
E(X Ad) and the expected cost per payment is [E(X)-E(X Ad)]/ (1—F(d)]. For 
a franchise deductible the expected cost per loss is E(X)—E(X Ad)+d{1—F(d)] 
and the expected cost per payment is [E(X)—E(X A d)\/[1 — F (d)| + d. 


Proof: For the per-loss expectation with an ordinary deductible, we have, 
from (3.7) and (3.10) that the expectation is E(X )-E(X Ad). From (5.1) 
and (5.2) we see that, to change to a per-payment basis, division by 1 — F(d) 
is required. The adjustments for the franchise deductible come from the fact 
that when there is a payment it will exceed that for the ordinary deductible 
by d. O 


Example 5.6 Determine the four expectations for the Pareto distribution 
from Examples 5.2 and 5.4 using a deductible of 500. 


Expectations could be derived directly from the density functions obtained 
in Examples 5.2 and 5.4. Using Theorem 5.5 and recognizing that we have a 
Pareto distribution, we can also look up the required values (the formulas are 
in Appendix A). That is, 


9 3 
F(500) = 1 ( DW ) = 0.488, 


2,000 + 500 
2,000 2,000 \? 
E(X A800) = 5 i (sana) | AEN 


With E(X) = 1,000 we have, for the ordinary deductible, the expected cost per 
loss is 1,000 — 360 = 640 while the expected cost per payment is 640/0.512 = 
1,250. For the franchise deductible the expectations are 640+500(1—0.488) = 
896 and 1,250 + 500 = 1,750. oO 


5.2.1 Exercises 


5.1 Perform the calculations in Example 5.2 for the following distribution 
(which is Model 4 on page 15) using an ordinary deductible of 5,000. 


0, a<0, 
F(z) = { fee 0.3e70-000012 2>0. 


5.2 Repeat Exercise 5.1 for a franchise deductible. 
5.3 Repeat Example 5.6 for the model in Exercise 5.1 and a 5,000 deductible. 


5.4 (*) Risk 1 has a Pareto distribution with parameters a > 2 and @. Risk 
2 has a Pareto distribution with parameters 0.8a and 0. Each risk is covered 
by a separate policy, each with an ordinary deductible of k. Determine the 
expected cost per loss for risk 1. Determine the limit as k goes to infinity of 
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Table 5.1 Data for Exercise 5.5 


10,000 0.60 6,000 
15,000 0.70 7,700 
22,500 0.80 9,500 
32,500 0.90 11,000 

oe) 1.00 20,000 


the ratio of the expected cost per loss for risk 2 to the expected cost per loss 
for risk 1. 


5.5 (*) Losses (prior to any deductibles being applied) have a distribution 
as reflected in Table 5.1. There is a per-loss ordinary deductible of 10,000 
The deductible is then raised so that half the number of losses exceed these 
deductible as exceeded the old deductible. Determine the percentage change 
in the expected cost per payment when the deductible is raised. 


5.3 THE LOSS ELIMINATION RATIO AND THE EFFECT OF 
INFLATION FOR ORDINARY DEDUCTIBLES 


A ratio that can be meaningful in evaluating the impact of a deductible is the 
loss elimination ratio. 


Definition 5.7 The loss elimination ratio is the ratio of the decrease in 
the expected payment with an ordinary deductible to the expected payment 


. without the deductible. 


While many types of coverage modifications can decrease the expected 
payment, the term loss elimination ratio is reserved for the effect of changing 
the deductible. Without the deductible, the expected payment is E(X). With 
the deductible, the expected payment (from Theorem 5.5) is E(X)—E(X Ad). 


Therefore the loss elimination ratio is 
E(X) — [E(X) — E(X Ad)] _ E(X Ad) 
| E(X) E(X) 
provided E(X) exists. 


Example 5.8 Determine the loss elimination ratio for the Pareto distribution 
with a = 3 and 0 = 2,000 with an ordinary deductible of 500. 


From Example 5.6 we have a loss elimination ratio of 360/1,000 = 0.36. 


a of losses can be eliminated by introducing an ordinary deductible 
o 
: g 
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Inflation increases costs, but it turns out that when there is a deductible The expected cost per loss after inflation is 1.1(1,000 — 336.08) = 730.32, an 
the effect of inflation is magnified. First, some events that formerly produced increase of 14.11%. On a per-payment basis we need 
losses below the deductible will now lead to payments. Second, the relative E mo E 
effect of inflation is magnified because the deductible is subtracted after infla- y (500) x( ) : 
tion. For example, suppose an event formerly produced a loss of 600. With a as phe ( 2,000 ) 
500 deductible the payment is 100. Inflation at 10% will increase the loss to 9,000 + 454.55 
660 and the payment to 160, a 60% increase in the cost to the insurer. = 0.459. 
Theorem 5.9 For an ordinary deductible of d after uniform inflation of 1+r, The expected cost per payment is 730.32/(1— 0.459) = 1,350, an increase of 
the expected cost per loss is 8%. a 


(1 + r){E(X) — E[X Ad/(1 +7)]}. 


If F{d/(i+r)| < 1, then the expected cost per payment is obtained by dividing 
by 1 — F[d/(1+r)]. 5.6 Determine the loss elimination ratio for the distribution given below with 
an ordinary deductible of 5,000. This is the same model used in Exercise 5.1. 


5.3.1 Exercises 


Proof: After inflation, losses are given by the random variable Y = (1+r)X. 
From Theorem 4.19, fy (y) = fxly/(i+r)|/(i+r) and Fy(y) = Fx[y/(+7)]- 
Using (3.8), 


0, z <0, 
Fi(z) = { 1 —0.3¢e70-00001z y > Q, 


5.7 Determine the effect of inflation at 10% on an ordinary deductible of 


d 
E(Y Ad) = Í yfy(y)dy + df — Fy (d)] 5,000 applied to the distribution in Exercise 5.6. 


_ [wey aft Be ( d ) 
o 


1l+r 1+r 


d/(1+r) s d | 
= | (1+ r)efie(a)de +41 — x (<5) 


(+r) i" xfx(a)dx + 3 i a (Hf 


5.8 (*) Losses have a lognormal distribution with u = 7 and ø = 2. There 
is a deductible of 2,000 and 10 losses are expected each year. Determine the 
loss elimination ratio. If there is uniform inflation of 20% but the deductible 
remains at 2,000, how many payments will be expected? 


5.9 (*) Losses have a Pareto distribution with a = 2 and @=k. There is 
an ordinary deductible of 2k. Determine the loss elimination ratio before and 
after 100% inflation. 


d ` 
= (1+r)E (x ^ =) 5.10 (*) Losses have an exponential distribution with a mean of 1,000. There 
: is a deductible of 500. Determine the amount by which the deductible would 
where the third line follows from the substitution z = y/(1 +r). Noting l have to be raised to double the loss elimination ratio. 

that E(Y) = (1 + r)E(X) completes the first statement of the theorem and | i l 
the per-payment result follows from the relationship between the distribution i 5.11 (*) The values in Table 5.2 are available for a random variable X. There 
functions of Y and X. O is a deductible of 15,000 per loss and no policy limit. Determine the expected 
cost per payment using X and then assuming 50% inflation (with the de- 


Example 5.10 Determine the effect of inflation at 10% on an ordinary de- ductible remaining at 15,000). 
ductible of 500 applied to a Pareto distribution with a = 3 and 0 = 2,000. 

5.12 (*) Losses have a lognormal distribution with u = 6.9078 and o = 
1.5174. Determine the ratio of the loss elimination ratio at 10,000 to the loss 
elimination ratio at 1,000. Then determine the percentage increase in the 


number of losses that exceed 1,000 if all losses are increased by 10%. 


From Example 5.6 the expected costs are 640 and 1,250 per loss and per 
payment respectively. With 10% inflation we need 


500 


E(X A 454.55) 


2 
Pe HU = 336.08. 
2 2,000 + 454.55 


i 


nt epson ert O ee E o A Aa 


5.13 (*) Losses have a mean of 2,000. With a deductible of 1,000 the loss 
elimination ratio is 0.3. The probability of a loss being greater than 1,000 is 
0.4. Determine the average size of a loss that is less than or equal to 1,000. 
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Table 5.2 Data for Exercise 5.11 


Fa Gg a ee = ee mene 
oe eee ee ee 
10,000 0.60 6,000 
15,000 0.70 7,700 
22,500 0.80 9,500 

co 1.00 20,000 


5.4 POLICY LIMITS 


The opposite of a deductible is a policy limit. The typical policy limit arises 
in a contract where for losses below u the insurance pays the full loss but for 
losses above u the insurance pays only u. The effect of the limit is to produce 
a right censored random variable. It will have a mixed distribution with 
distribution and density function given by (where Y is the random variable 
after the limit has been imposed) 


and 


fy@)= { es ae 


The effect of inflation can be calculated as follows. 


Theorem 5.11 For a policy limit of u, after uniform inflation of 1 +r, the 
expected cost is (1-+r)ELX Au/(1+7)].- 


Proof: The expected cost is E(Y Au). The proof of Theorem 5.9 shows that 
this equals the expression given in this theorem. O 


For policy limits the concept of per payment and per loss is not relevant. 
All losses that produced payments prior to imposing the limit will produce 
payments after the limit is imposed. 


Example 5.12 Impose a limit of 3,000 on a Pareto distribution with a = 3 
and 0 = 2,000. Determine the expected cost per loss with the limit as well 
as the proportional reduction in expected cost. Repeat these calculations after 
10% uniform inflation is applied. 


For this Pareto distribution, the expected cost is 


2,000 2,000 > 
E(X A 3,000) = 2 f (T) | = 840 
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0.0014 0.064 of probability at 3,000 j~ 


0 500 1,000 1,500 2,000 2,500 3,000 3,500 


x 


Fig. 5.2 Density function for Example 5.12. 


and the proportional reduction is (1,000 — 840)/1,000 = 0.16. After inflation 
the expected cost is 


2,000 2,000 5 
: he = = 903.1 
1.1E(X A3,000/1-1) = 1.125 f (a) 903.11 


for a proportional reduction of (1,100 — 903.11)/1,100 = 0.179. Also note 
that after inflation the expected cost has increased 7.51%, less than the gen- 
eral inflation rate. The effect is the opposite of the deductible—inflation is 
tempered, not exacerbated. 

Figure 5.2 shows the density function for the right censored random vari- 
able. From 0 to 3,000 it matches the original Pareto distribution. The proba- 
bility of exceeding 3,000, Pr(X > 3,000) = (2,000/ 5,000)? = 0.064 is concen- 
trated at 3,000. O 


A policy limit and an ordinary deductible go together in the sense that, 
whichever applies to the insurance company’s payments, the other applies to 
the policyholder’s payments. For example, when the policy has a deductible 
of 500, the cost per loss to the policyholder is a random variable that is right 
censored at 500. When the policy has a limit of 3,000, the policyholder’s 
payments are a variable that is left truncated and shifted (as in an ordinary 
deductible). The opposite of the franchise deductible is a coverage that right 
truncates any losses (see Exercise 3.12). This coverage is rarely, if ever, sold. 
(Would you buy a policy that pays you nothing if your loss exceeds u?) 


126 FREQUENCY AND SEVERITY WITH COVERAGE MODIFICATIONS 


5.4.1 Exercises 


5.14 Determine the effect of 10% inflation on a policy limit of 150,000 on the 
following distribution. This is the same distribution used in Exercises 5.1 and 
5.6. 

0, z<0 
F,(z) = { 1 —0.3¢70-00001z y > 0. 


5.15 (*) Let X has a Pareto distribution with a = 2 and 8 = 100. Determine 
the range of the mean residual life function e(d) as d ranges over all positive 
numbers. Then let Y = 1.1X. Determine the range of the ratio ey (d)/ex(d) 
as d ranges over all positive numbers. Finally, let Z be X right censored at 
500 (that is, a limit of 500 is applied to X ). Determine the range of ez(d) as 
d ranges over the interval 0 to 500. 


5.5 COINSURANCE, DEDUCTIBLES, AND LIMITS 


The final common coverage modification is coinsurance. In this case the in- 
surance company pays a proportion, a, of the loss and the policyholder pays 
_ the remaining fraction. If coinsurance is the only modification, this changes 
the loss variable X to the payment variable, Y = aX. The effect of multipli- 
cation has already been covered. When all four items covered in this chapter 
are present (ordinary deductible, limit, coinsurance, and inflation), we create 
the following per-loss random variable: 


0, X< š 
1l+r 
Ve) aie ea eee 
a{(1 +r) 1; ter < EFA 
a(u —d), x > : 
1l+r 


For this definition, the quantities are applied in a particular order. In partic- 
ular, the coinsurance is applied last. For the contract illustrated above, the 
policy limit is a(u — d), the maximum amount payable. In this definition, u 
is the loss above which no additional benefits are paid and will be called the 
maximum covered loss. For the per-payment variable, Y? is undefined for 
X <d/(1+r). 

Previous results can be combined to produce the following theorem, given 
without proof. 


Theorem 5.13 For the per-loss variable, 


EY!) =a(1 +r) [e (xaz) -3(xA =) 
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The expected value of the per-payment variable is obtained as 
E(Y*) 

aay Pe 

1- Fx (34) 


. Higher moments are more difficult. The next theorem gives the formula for 
the second moment. The variance can then be obtained by subtracting the 
square of the mean. 


E(Y?) = 


Theorem 5.14 For the per-loss variable 


ENY] = (+r HEX Au*)"] - EX Ad*)’] 
—2d*E(X Au*) + 2d*E(X A d*)}, 


where uë = u/(1 +r) and d* = d/(1 +r). For the second moment of the 
per-payment variable, divide this expression by 1 — F'x(d*). 


Proof: From the definition of Y”, 


- 


E lore] 


Í. a? |(1 +r)z — d}? f(x)dx + E a? (u — d) f(x)dx 
E yl 2 u” d 
Bley! = (1+rř J x’ f (z)dx — [ ao 


—2(1+r)d i tf (x)dz — ie sjaa 


+d2?[F(u*) — F(d*)] + (u — dP i — F(u*)] 
= (1+r)HEX Au*)?] —u[1 — F(u*) 

-E[(X Ad*)?] +d" [1 — F(d*)]} 
—2(1+r)?d*{E(X Au*) — u*[1 — F(u*)] 
—E(X A d*) + d*{1 — F(d*)]} 
+(1+r)?d"[F(u*) — F(d*)] 
+(1+17)?(u* — d*)?[1 — F(u*)] 

E [Y5] 


war E E((X A u*)?] — E[(X A d*)?] — 2d*[E(X A u*) — E(X A d*)]. 


O 
Example 5.15 Determine the mean and standard deviation per loss for a 


Pareto distribution with œ = 3 and 6 = 2,000 with a deductible of 500 and a 
policy limit of 2,500. Note that the maximum covered loss is u = 3,000. 


> eee ""”"-—-Ltt—t*t 
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From earlier examples, E(X A 500) = 360 and E(X A 3,000) = 840. The 
second limited moment is 


3 
4, 3(2,000)° 2 { 2,000 

BX Aw] = i 2 Te 2,000" (u t2000 

2,000 


u+-2,000 2 —4 2 
= 3(2,000)* l ZA (y _ 2,000)"y dy Hu ( + 2,000 


u+2,000° 
= 3(2,000)° Ga + 2,000y~? — ) 
2,000 


»( 2,000 \° 
tu \ + 2,000 
2,000 2,000? | 


1 
sf 1 2000 _ 2000 __ 
EnS |- rao t we 20007 3u + 2,000) 
aii 2,000 2,000? | 
+3(2,000)" | 5500 ~ 2,000” ` 3(2,000) 


a / 2000. 4° 
tu T2000 


3 
2, 
= (2,000)? — (500) (2u + 2,000) (u + 2,000). 


2,000? _3 
3 


Then, E[(X A 500)?] = 160,000 and E[(X A 3,000)?] = 1,440,000 and so 


E(Y) 840 — 360 = 480 
E(Y2) = 1,440,000 — 160,000 — 2(500)(840) + 2(500) (360) = 800,000 


{I 


for a variance of 800,000 — 4802 = 569,600 and a standard deviation of 
754.72. o 


5.5.1 Exercises 


5.16 (*) You are given that e(0) = 25, S(x) =1— z/w, 0 < x <w, and Y is 
the excess loss variable for d= 10. Determine the variance of Y. 


5.17 (*) The loss ratio (R) is defined as total losses (L) divided by earned 
premiums (P). An agent will receive a bonus (B) if the loss ratio on his 
business is less than 0.7. The bonus is given as B = P(0.7 — R)/3 if this 
quantity is positive, otherwise it is zero. Let P = 500,000 and L have a 
Pareto distribution with parameters a: = 3 and 0 = 600,000. Determine the 
expected value of the bonus. 


5.18 (*) Losses this year have a distribution such that (XA d) = —0.025d?+ 


1.475d — 2.25 for d = 10,11,12,...,26. Next year, losses will be uniformly 
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higher by 10%. An insurance policy reimburses 100% of losses subject to a 
deductible of 11 up to a maximum reimbursement of 11. Determine the ratio 
of next year’s reimbursements to this year’s reimbursements. 


5.19 (*) Losses have an exponential distribution with a mean of 1,000. An 
insurance company will pay the amount of each claim in excess of a deductible 
of 100. Determine the variance of the amount paid by the insurance company 
for one claim, including the possibility that the amount paid is zero. 


5.20 (*) Total claims for a health plan have a Pareto distribution with a = 2 
and @ = 500. The health plan implements an incentive to physicians that 
will pay a bonus of 50% of the amount by which total claims are less than 
500; otherwise no bonus is paid. It is anticipated that with the incentive plan 
the claim distribution will change to become Pareto with a = 2 and 0=K. 
With the new distribution it turns out that expected claims plus the expected 
bonus is equal to expected claims prior to the bonus system. Determine the 
value of K. 


5.21 (*) In year a, total expected losses are 10,000,000. Individual losses in 
year a have a Pareto distribution with a = 2 and @ = 2,000. A reinsurer 
pays the excess of each individual loss over 3,000. For this, the reinsurer is 
paid a premium equal to 110% of expected covered losses. In year b, losses 
will experience 5% inflation over year a, but the frequency of losses will not 
change. Determine the ratio of the premium in year b to the premium in year 
a. 


5.22 (*) Losses have a uniform distribution from 0 to 50,000. There is a 
per-loss deductible of 5,000 and a policy limit of 20,000 (meaning that the 
maximum covered loss is 25,000). Determine the expected payment given 
that a payment has been made. 


5.23 (*) Losses have a lognormal distribution with = 10 and ø = 1. For 
losses below 50,000, no payment is made. For losses between 50,000 and 
100,000, the full amount of the loss is paid. For losses in excess of 100,000 the 
limit of 100,000 is paid. Determine the expected cost per loss. 


5.6 THE IMPACT OF DEDUCTIBLES ON CLAIM FREQUENCY 


An important component in analyzing the effect of policy modifications per- 
tains to the change in the frequency distribution of payments when the de- 
ductible (ordinary or franchise) is imposed or changed. When a deductible 
is imposed or increased, there will be fewer payments per period, while if a 
deductible is lowered, there will be more payments. 

We can quantify this process, providing it can be assumed that the im- 
position of coverage modifications does not affect the process that produces 
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losses or the type of individual who will purchase insurance. For example, 
those who buy a 250 deductible on an automobile property damage coverage 
may (correctly) view themselves as less likely to be involved in an accident 
than those who buy full coverage. Similarly, an employer may find that the 
rate of permanent disability declines when reduced benefits are provided to 
employees in the first few years of employment. 

To begin, suppose Xj, the severity, represents the ground-up loss on the jth 
such loss and there are no coverage modifications. Let N © denote the number 
of losses. Now consider a coverage modification such that v is the probability 
that a loss will result in a payment. For example, if there is a deductible of 
‘d, v = Pr(X > d). Next, define the indicator random variable I; by Ij =1 
if the jth loss results in a payment and J; = 0 otherwise. Then J; has a 
Bernoulli distribution with parameter v and the pgf of I; is Pr, (2) = 1—v+vz. 
Then NP = J, +---+JIyx represents the number of payments. Tf fy, Io,--. 
are mutually independent and are also independent of NE, then NP has a 
compound distribution with N L as the primary distribution and a Bernoulli 
secondary distribution. Thus 


This result may be generalized for zero-modified and zero-truncated distri- 
butions. Suppose N? depends on parameters 8 and a such that 


Bl6(z —1)] — B(-8) 


Pyt(z) = Pyt (z;0,a) =a+(1—a) 1- BCA ; 


(5.3) 


Note that a = Py: (0) = Pr(N¥ = 0) and so is the modified probability at 
zero. It is also the case that, if B[@(z—1)] is itself a pgf, then the pgf given in 
(5.3) is that for the corresponding zero-modified distribution. However, it is 
not necessary for B[@(z—1)] to be a pgf in order for Pyz (z) as given in (5.3) 
to be a pgf. In particular, B(z) = 1+ In(1 — z) yields the zero-modified (ZM) 
logarithmic distribution, even though there is no distribution with B(z) as its 
pgf. Similarly, B(z) = (1—z)~" for +1 < r < 0 yields the ETNB distribution. 
A few algebraic steps reveal that for (5.3) 


Pyp(z) = Pyt(z; v0, 0°), 


where a* = Pr(N? = 0) = Pye (0) = Py: (1 —v;6,0). It is expected that 
imposing a deductible will increase the value of a because periods with no 
payments will become more likely. In particular, if N © is zero truncated, NP 
will be zero modified. 


Pyp(z) = Pye [Pr (z)| = Pue [1 + v(z — 1)]. 


In the important special case in which the distribution of N? depends on 
a parameter f such that 
Example 5.17 Repeat the previous erample, only now let the frequency dis- 
tribution be zero-modified negative binomial with r = 2, 8 = 3, and pi! = 0.4. 


The pef is 


Pyt(z) = Pye (20) = BlO(z —1)], 
where B(z) is functionally independent of @ (as in Theorem 4.54), then 


Pye(z) = B[0(1—v+vz-1)] 
= B[v6(z—1)] 
= Pyt(z;v6). 


Pyt(z) = po’ + (1 ce 


Then a = på! and B(z) = (1—z)~". We then have r* =r, 8* = vf, and 
This implies that NŻ and N? are both from the same parametric family and 


only the parameter @ need be changed. at = ph =p +(1— p) [Le Oe) (la Bye 
1- (1+8) 
Example 5.16 Demonstrate that the above result applies to the negative bi- M (1+8) -r _ pM Š 
+ (1+ - 7 
nomial distribution and illustrate the effect when a deductible of 250 is im- = Ce ee 


posed on a negative binomial distribution with r = 2 and B =3. Assume that 


losses have a Pareto distribution with a = 3 and 0 = 1,000. For the particular distribution given, the new parameters are r* = 2, B* = 


3(0.512) = 1.536, a 
The negative binomial pgf is Py: (z) = [1—G(z—1)]~". Here £ takes on the ( ) SRS 


role of 8 in the result and B(z) = (1 — z)". Then N? must have a negative 
binomial distribution with r* = r and 8* = vf. For the particular situation 
described, 


Ms _ 0.4—47? + 2.53672 — 0.4(2.536)~? 
pitt = = = 0.4595. 
1-4? o 


1,000 \° 
ye l= F (250) = (a) = 0.512 


and so r* = 2 and 8* = 3(0.512) = 1.536. 


in applications, it may be the case that we want to determine the distrib- 
ution of N? from that of NP. For example, data may have been collected on 
the number of payments in the presence of a deductible and from that data 
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the parameters of N? can be estimated. We may then want to know the 
distribution of payments if the deductible is removed. Arguing as before, 


Pyt(z) = Pye (1 — v7! + zu"). 


This implies that the formulas derived previously hold with v replaced by 1/v. 
However, it is possible that the resulting pgf for N” is not valid. In this case 
one of the modeling assumptions is invalid (for example, the assumption that 
changing the deductible does not change claim-related behavior). 


Example 5.18 Suppose payments on a policy with a deductible of 250 have 
the zero-modified negative binomial distribution with r* = 2, B* = 1.536, and 
p= = 0.4595. Losses have the Pareto distribution with a = 3 and 8 = 1,000. 
Determine the distribution of the number of payments when the deductible is 
removed. Repeat this calculation assuming pi!* = 0.002. 

In this case the formulas use v = 1/0.512 = 1.953125 and so r = 2 and 
8 = 1.953125(1.536) = 3. Also, 


pile = 0.4595 — 2.53677 + 47? — 0.4595(4)~? _ a 
£ 1 — 2.53672 l 
as expected. For the second case, 
. 0.002 — 2.5367? + 47? — 0.002(4)~? 
Mx paar 
Po = 135362 = —0.1079, 
which is not a legitimate probability. Oo 


All members of the (a, 6,0) and (a,b, 1) classes meet the conditions of this 
section. Table 5.3 indicates how the parameters change when moving from 
NZ? to NP. If N¥ has a compound distribution, then we can write Pyz(z) = 
P,[P2(z)] and therefore 


Pye (z) = Pys[L + 0(z — 1)] = P {Pol + v(2 — 1))}- 


Thus N? will also have a compound distribution with the secondary distri- 
bution modified as indicated. If the secondary distribution has an (a, b, 0) 
distribution, then it can modified as in Table 5.3. The following example in- 
dicates the adjustment to be made if the secondary distribution has an (a, b, 1) 
distribution. 


Example 5.19 Suppose N? is Poisson-ETNB with A = 5, 8 = 0.3, and 
r=4. Ifv=0.5, determine the distribution of NP. 


From the discussion above, N? is compound Poisson with \* = 5, but the 
secondary distribution is a zero-modified negative binomial with (from Table 
5.3) B* = 0.5(0.3) = 0.15, 


it — On B34 + L154 — Of 15)"4 


aa = 0.34103, 


THE IMPACT OF DEDUCTIBLES ON CLAIM FREQUENCY 133 


Table 5.3 Frequency adjustments 


NE Parameters for NP 


Poisson A = vr 
a ee a 
ZM Poisson pet Rese One ao — phe 
i 1 — e~à 
Binomial =v 
P 


7M binomial Me a a eal 


, A =À 


q 
Negative binomial B =vB, re =r 
_ Por — (1+ A)" + (1+ 08) = po (1 + 08)" 


ZM negative ae 
Po Te 
binomial B =v6, r*=r 
ZM logarithmic pè = 1—(1— pa) In(1 + v8) /In(1 + 8) 
p* = vB 


and r* = 4. This would be sufficient, except we have acquired the habit of us- 
ing the ETNB as the secondary distribution. From Theorem 4.54 a compound 
Poisson distribution with a zero-modified secondary distribution is equivalent 
to a compound Poisson distribution with a zero-truncated secondary distrib- 
ution. The Poisson parameter must be changed to (1 — pa!*)\*. Therefore, 
NP has a Poisson-ETNB distribution with \* = (1 — 0.34103)5 = 3.29485, 
B* = 0.15, and r* = 4. o 


The results can be further generalized to an increase or decrease in the 
deductible. Let N? be the frequency when the deductible is d and let N@ be 
the frequency when the deductible is d*. Let v = [1 — Fx(d*)]/[1 — Fx(d)}, 
and then Table 5.3 can be used to move from the parameters of N“@ to the 
parameters of NT. As long as d* > d, we will have v < 1 and the formulas 
will lead to a legitimate distribution for N“. This includes the special case 
of d = 0 that was used at the start of this section. If d* < d, then v > 1 and 
there is no assurance that a legitimate distribution will result. This includes 
the special case d* = 0 (removal of a deductible) covered earlier. 

Finally, it should be noted that policy limits have no effect on the frequency 
distribution. Imposing, removing, or changing a limit will not change the 
number of payments made. 


5.6.1 Exercises 


5.24 A group life insurance policy has an accidental death rider. For ordinary 
deaths, the benefit is 10,000; however, for accidental deaths, the benefit is 
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20,000. The insureds are approximately the same age, so it is reasonable 
to assume they all have the same claim probabilities. Let them be 0.97 for 
no claim, 0.01 for an ordinary death claim, and 0.02 for an accidental death 
claim. A reinsurer has been asked to bid on providing an excess reinsurance 
that will pay 10,000 for each accidental death. 


(a) The claim process can be modeled with a frequency component that 
has the Bernoulli distribution (the event is claim/no claim) and 
a two-point severity component (the probabilities are associated 
with the two claim levels, given that a claim occurred). Specify 
the probability distributions for the frequency and severity random 
variables. 

(b) Suppose the reinsurer wants to retain the same frequency distribu- 

tion. Determine the modified severity distribution that will reflect 

the reinsurer’s payments. 


< 


Determine the reinsurer’s frequency and severity distributions when 
the severity distribution is to be conditional on a reinsurance pay- 
ment being made. 


“= 
Q 
S 


5.25 Individual losses have a Pareto distribution with a = 2 and @ = 1,000. 
With a deductible of 500 the frequency distribution for the number of pay- 
ments is Poisson-inverse Gaussian with \ = 3 and £ = 2. If the deductible is 
raised to 1,000, determine the distribution for the number of payments. Also, 
determine the pdf of the severity distribution (per payment) when the new 
deductible is in place. 


5.26 Losses have a Pareto distribution with a = 2 and 8 = 1,000. The fre- 
quency distribution for a deductible of 500 is zero-truncated logarithmic with 
B = 4. Determine a model for the number of payments when the deductible 
is reduced to 0. 


5.27 Suppose that the number of losses N L has the Sibuya distribution (see 
Exercises 4.47 and 4.59) with pgf Pyr (z) = 1 — (1 — z)", where —1 < r < 
0. Demonstrate that the number of payments has a zero-modified Sibuya 
distribution. 


Aggregate loss models 


6.1 INTRODUCTION 


The purpose of this chapter is to develop models of aggregate losses, the total 
amount paid on all claims occurring in a fixed time period on a defined set 
of insurance contracts. There are two ways to go about adding the claims in 
order to obtain the total for the period. , 

One method is to record the payments as they are made and then add 
them up. In that case we can represent the aggregate losses as a sum, S, 
: a random number, N, of individual payment amounts (X1, Xe2,...,Xw). 

ence, « , 


S=X,+Xet---+Xy, N=0,1,2,..., (6.1) 
where S = 0 when N = 0. 


Definition 6.1 The collective risk model has the representation in (6.1) 
with the X;s being independent and identically distributed (i.i.d.) random 
variables, unless otherwise specified. More formally, the independence as- 
sumptions are: 


1. Conditional on N = n, the random variables X1, X2,..., Xn are tid. 
random. variables. 


Loss Models: From Data to Decisions, Second Edition. 
By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 
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2. Conditional on N = n, the common distribution of the random variables 
Xi, Xo,...,Xn does not depend on n. 


3. The distribution of N does not depend in any way on the values of 
Xi, X2,- 


An alternative model is given below 


Definition 6.2 The individual risk model represents the aggregate loss as 
a sum, S = Xı +--+ Xn, of a fixed number, n, of insurance contracts. The 
loss amounts for the n contracts are (X1, X2, ,Xn), where the Xjs are as- 


` sumed to be independent but are not assumed to be identically distributed. The 


distribution of the X;s usually has a probability mass at zero, corresponding 
to the probability of no loss or payment. 


The individual risk model is used to add together the losses or payments 
from a fixed number of insurance contracts or sets of insurance contracts. 
It is used in modeling the losses of a group life or health insurance policy 
that covers a group of n employees. Each employee can have different cover- 
age (life insurance benefit as a multiple of salary) and different levels of loss 
probabilities (different ages and health status). 

In the special case where the Xjs are identically distributed, the individual 
risk model becomes the special case of the collective risk model, with the 
distribution of N being the degenerate distribution with all of the probability 
at N = n; that is, Pr(N =n) =1. 

The distribution of S in (6.1) is obtained from the distribution of N and the 
distribution of the X;s. Using this approach, the frequency and the severity 
of claims are modeled separately. The information about these distributions 
is used to obtain information about S. An alternative to this approach is to 
simply gather information about S (e.g., total losses each month for a period 
of months) and to use some model from Chapter 4 to model the distribution of 
S. Modeling the distribution of N and the distribution of the X;s separately 
has some distinct advantages: 


1. The expected number of claims changes as the number of insured policies 
changes. Growth in the volume of business needs to be accounted for 
in forecasting the number of claims in future years based on past years’ 
data. 


2. The effects of general economic inflation and additional claims inflation 
are reflected in the losses incurred by insured parties and the claims 
paid by insurance companies. Such effects are often masked when in- 
surance policies have deductibles and policy limits which do not depend 
on inflation and aggregate results are used. 


3. The impact of changing individual deductibles and policy limits is more 
easily studied. This is done by changing the specification of the severity 
distribution. 


INTRODUCTION 137 


4. The impact on claims frequencies of changing deductibles is better un- 
derstood. 


5. Data that are heterogeneous in terms of deductibles and limits can be 
combined to obtain the hypothetical loss size distribution. This is useful 
when data from several years in which policy provisions were changing 
are combined. 


6. Models developed for noncovered losses to insureds, claim costs to in- 
surers, and claim costs to reinsurers can be mutually consistent. This is 
useful for a direct insurer which is studying the consequence of shifting 
losses to a reinsurer. 


7. The shape of the distribution of S depends on the shapes of both distri- 
butions of N and X. The understanding of the relative shapes is useful 
when modifying policy details. For example, if the severity distribution 
has a much heavier tail than the frequency distribution, the shape of 
the tail of the distribution of aggregate claims or losses will be deter- 
mined by the severity distribution and will be insensitive to the choice 
of frequency distribution. 


In summary, a more accurate and flexible model can be constructed by 
examining frequency and severity separately. 

In constructing the model (6.1) for S, if N represents the actual number of 
losses to the insured, then the Xjs can represent (i) the losses to the insured, 
(ii) the claim payments of the insurer, (iii) the claim payments of a reinsurer, 
or (iv) the deductibles (self-insurance) paid by the insured. In each case, the 
interpretation of S is different and the severity distribution can be constructed 
in a consistent manner. 

Because the random variables N, X1, X2,..., and S provide much of the 
focus for this chapter and the two that follow, we want to be especially care- 
ful when referring to them. We will refer to N as the claim count random 
variable and will refer to its distribution as the claim count distribu- 
tion. The expression number of claims will also be used, and, occasionally, 
just claims will be used. Another term that will commonly be used is fre- 
quency distribution. The X;s are the individual or single-loss random 
variables. The modifier individual or single will be dropped when the 
reference is clear. In Chapter 5 a distinction was made between losses and 
payments. Strictly speaking, the X;s are payments because they represent a 
real cash transaction. However, the term loss is more customary, and we will 
continue with it. Another common term for the Xjs is severity. Finally, S is 
the aggregate loss random variable or the total loss random variable. 


Example 6.3 An insurer estimates that individual losses to an insured follow 
a distribution with pdf fx(x). The insurer pays 80% of individual losses in 
excess of 1,000 with a mazimum payment of 100,000. Jt reinsures that portion 
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of any payments in excess of 50,000. Develop models for each of the following: 
(a) the total loss preinsurance to the policyholder, (b) the aggregate loss to 
the insurer prior to the reinsurance payment, (c) the aggregate loss to the 
reinsurer, (d) the aggregate loss to the insurer, after the reinsurance payment, 
and (e) the aggregate loss to the insured. 


(a) The aggregate losses with no insurance are S = X1 +Xot+---+Xn, 
where the X,s have the distribution with pdf fx (x). 


(b) The aggregate payments by the insurer (before recovery of reinsur- 
ance) are S = Yı + Y2 +-+- + YN, where 


0, X; < 1,000, 
Y; = 4 0.80(X; — 1,000), 1,000 < X; < 126,000, 
100,000, X; > 126,000. 


(c) The aggregate payments by the reinsurer are S=Y\,4+Yo+---+Yn, 
where 


0, X; < 63,500, 
Y; = 4 0.80(X; — 63,500), 63,500 < X; < 126,000, 
50,000, X; > 126,000. 


(d) The aggregate costs to the insurer after recovery of reinsurance 
payments are S = Y; + Y2 + --- + Yy, where 


0, X; < 1,000, 
Y; = 4 0.80(X; — 1,000), 1,000 < X; < 63,500, 
50,000, X; > 63,500. 


(e) The aggregate costs to the insured, that is, the uninsured costs, 
are S = Yi + Y2 +- -- + Yn, where 


X; X; < 1,000, 
Y; = 800+0.20X;, 1,000 < X; < 126,000, 
X; — 100,000, X; > 126,000. 


6.1.1 Exercise 


6.1 Show that the costs to the insured, the insurer, and the reinsurer in 
Example 6.3 sum to the aggregate loss. 
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6.2 MODEL CHOICES 


In many cases of fitting frequency or severity distributions to data, several 
distributions may be good candidates for models. However, some distributions 
may be preferable for a variety of practical reasons. 

In general, it is useful for the severity distribution to be a scale distribution 
(see Definition 4.2) because the choice of currency (e.g., U.S. dollars or British 
pounds) should not affect the result. Also, scale families are easy to adjust 
for inflationary effects over time (this is, in effect, a change in currency; e.g., 
1994 U.S. dollars to 1995 U.S. dollars). When forecasting the costs for a future 
year, the anticipated rate of inflation can be factored in easily by adjusting 
the parameters. 

A similar consideration applies to frequency distributions. As a block of an 
insurance company’s business grows, the number of claims can be expected to 
grow, all other things being equal. If one chooses models that have probability 
generating functions of the form 


Py (z; 0) = Q(z) (6.2) 


for some parameter a, then the expected number of claims is proportional 
to a. Increasing the volume of business by 100r% results in expected claims 
being proportional to a* = (1+r)a. This was discussed in Section 4.6.11. 
Because r is any value satisfying r > —1, the distributions satisfying (6.2) 
should allow a to take on any positive values. Such distributions can be 
shown to be infinitely divisible (see Definition 4.65). 

A related consideration also suggests frequency distributions that are infi- 
nitely divisible. This relates to the concept of invariance over the time period 
of study. Ideally the model selected should not depend on the length of the 
time period used in the study of claims frequency. The expected frequency 
should be proportional to the length of the time period after any adjustment 
for growth in business. This means that a study conducted over a period of 
10 years can be used to develop claims frequency distributions for periods of a 
month, a year, or any other period. Furthermore, the form of the distribution 
for a one-year period is the same as for a one-mouth period with a change of 
parameter. The parameter a corresponds to the length of a time period. For 
example, if a = 1.7 in (6.2) for a one-month period, then the identical model 
with a = 20.4 is an appropriate model for a one-year period. 

Distributions that have a modification at zero are not of the form (6.2). 
However, it may still be desirable to use a zero-modified distribution if the 
physical situation suggests it. For example, if a certain proportion of policies 
never make a claim, due to duplication of coverage or other reason, it may be 
appropriate to use this same proportion in future periods for a policy selected 
at random. 
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6.2.1 Exercises 


6.2 For pgfs satisfying (6.2), show that the mean is proportional to a. 


6.3 Which of the distributions in Appendix B satisfy (6.2) for any positive 
value of a? 


6.3 THE COMPOUND MODEL FOR AGGREGATE CLAIMS 


Let S denote aggregate losses associated with a set of N observed claims 
X1,Xo,--- , Xy satisfying the independence assumptions following (6.1). The 
approach in this chapter is to: 


1. Develop a model for the distribution of N based on data. 
2. Develop a model for the common distribution of the X;s based on data. 


3. Using these two models, carry out necessary calculations to obtain the 
distribution of S. 


Completion of the first two steps follows the ideas developed elsewhere in 
this text. We now presume that these two models are developed and that 
we only need to carry out numerical work in obtaining solutions to problems 
associated with the distribution of S. These might involve pricing a stop-loss 
reinsurance contract, and they require analyzing the impact of changes in 
deductibles, coinsurance levels, and maximum payments on individual losses. 

The random sum 

S =X + Xo+ XN 


(where N has a counting distribution) has a distribution function 
Fs(z) = Pr(S <2) 


So Pn Pr(S < 2|N =n) 


n=0 


NO pnFx"(2); (6.3) 


n=0 


where Fx (x) = Pr(X < 2) is the common distribution function of the Xjs 
and pn = Pr(N = n). In (6.3), FẸ (x) is the “n-fold convolution” of the cdf 
of X. It can be obtained as 


0, «<0, 
FP) = 


1, x> 0, 
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and m 
FE (£) = / FLOYD (g — y) dFx(y) for k=1,2,.... (6.4) 


Tf X is a continuous random variable with probability zero on negative values, 
(6.4) reduces to 


T 
F#(z)= | FHP (s — y)fx(y) dy for k= 2,3,.... 
For k = 1 this equation reduces to F¥ (x) = F'x(x). By differentiating, the 
pdf is 
a) = [EP Ce whic dy for b= 2,8,.... 


In the case of discrete random variables with positive probabilities at 0,1,2,..., 
Equation (6.4) reduces to 


F3 (a) = ye Oe —y)fx(y) for z =0,1,..., k= 2,3,.... 
y=0 
The corresponding pf is 
zE (g) = YE- yfxl) foris 0 leeg kE 2 8p 
y=0 


The distribution (6.3) is called a compound distribution and the pf for 
the distribution of aggregate losses is . 


fs(t) = Y maik 0): 


n=0 


Arguing as in Section 4.6.7, the pgf of S is 
Ps(z) = Efe5| 


oe) 
= DOi aa =n] Pr(N =n) 


n=0 
= S E È J Pr(N =n) 
n=0 j=l 


= 5 >Pr(N =n)[Px(z)]” 


n=0 


= E[Px(z)"] = PriPx(z)] (6.5) 


due to the independence of Xj,...,Xn for fixed n. 
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A similar relationship exists for the other generating functions. It is some- 
times more convenient to use the characteristic function 


ys(z) = E(e’**) = Pylex(2)]; 
which always exists. Panjer and Willmot [106] use the Laplace transform 
Ls(z) = E(e~**) = Py[Lx(2)] 


which always exists for random variables defined on nonnegative values. With 
regard to the moment generating function, we have 


Msg(z) = Py[Mx(z)]- 


The pgf of compound distributions was discussed in Section 4.6.7 where the 
“secondary” distribution plays the role of the claim size distribution in this 
chapter. 

In the case where Py(z) = P,[P2(z)|-that is, N is itself a compound 
distribution—Ps(z) = Pi{P2[Px(z)]}, which in itself produces no additional 
difficulties. 

From (6.5), the moments of S$ can be obtained in terms of the moments of 
N and the X;s. The first three moments are 


E(S) = lsi = UNEX = E(N)E(X), 
Var(S) = bse = Unilexe + nolla) (6.6) 
E{[S — E(5)]°} lss = Hyihxs + 3iwol’xabixe + unal Hx) 


ll 


Here, the first subscript indicates the appropriate random variable, the second 
subscript indicates the order of the moment, and the superscript is a prime 
(’) for raw moments (moments about the origin) and is unprimed for central 
moments (moments about the mean). The moments can be used on their own 
to provide approximations for probabilities of aggregate claims by matching 
the first few model and sample moments. 


Example 6.4 The observed mean (and standard deviation) of the number of 
claims and the individual losses over the past 10 months are 6.7 (2.3) and 
179,247 (52,141), respectively. Determine the mean and variance of aggregate 
claims per month. 


E(S) 
Var(S) 


6.7(179,247) = 1,200,955, 
6.7(52,141)? + (2.3)? (179,247)? 
1.88180 x 10%. 


Hence, the mean and standard deviation of aggregate claims are 1,200,955 
and 433,797, respectively. o 
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Example 6.5 (Example 6.4 continued) Using normal and lognormal distrib- 
utions as approximating distributions for aggregate claims, calculate the prob- 
ability that claims will exceed 140% of expected costs. That is, 


Pr(S > 1.40 x 1,200,955) = Pr(S > 1,681,337). 


-For the normal distribution 


ras aad aay) ES (: —E(S) _ 1,681,337 — ga) 


Var(S) 433,797 
Pr(Z > 1.107) = 1 — (1.107) = 0.134. 


For the lognormal distribution, from Appendix A, the mean and second 
raw moment of the lognormal distribution are 


E(S) =exp(u+4o7) and E(S?) = exp(2u + 20°). 


Equating these to 1.200955 x 10° and 1.88180 x 101! + (1.200955 x 108)? = 
1.63047 x 101? and taking logarithms results in the following two equations in 
two unknowns: 


p+ 4c? = 13.99863, 24+ 207 = 28.11989. 
From this u = 13.93731 and o°? = 0.1226361. Then 


In 1,681,337 — 13.93731 


1- 
(0.1226361)0-5 


Pr(S > 1,681,337) 


Il 


1 — (1.135913) = 0.128. 


` The normal distribution provides a good approximation when E(N) is large. 
In particular, if N has the Poisson, binomial, or negative binomial distribu- 
tion, a version of the central limit theorem indicates that, as À, m, or T, 
respectively, goes to infinity, the distribution of S becomes normal. In this 
example, E(N) is small so the distribution of S is likely to be skewed. In this 
case the lognormal distribution may provide a good approximation, although 
there is no theory to support this choice. m 


Example 6.6 (Group dental insurance) Under a group dental insurance plan 
covering employees and their families, the premium for each married employee 
is the same regardless of the number of family members. The insurance com- 
pany has compiled statistics showing that the annual cost (adjusted to current 
dollars) of dental care per person for the benefits provided by the plan has the 
distribution in Table 6.1 (given in units of 25 dollars). 

Furthermore, the distribution of the number of persons per insurance cer- 
tificate (that is, per employee) receiving dental care in any year has the dis- 
tribution given in Table 6.2. 
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Table 6.1 Loss distribution for Example 6.6 


fx(z) 


0.150 
0.200 
0.250 
0.125 
0.075 
0.050 
0.050 
0.050 
0.025 
0 0.025 


8 


FOON Oar wWNr 


Table 6.2 Frequency distribution for Example 6.6 


Pn 


0.05 
0.10 
0.15 
0.20 
0.25 
0.15 
0.06 
0.03 
0.01 


ONantrwnwre oO] 3 


The insurer is now in a position to calculate the distribution of the cost per 
year per married employee in the group. The cost per married employee is 


8 
fs(2)= $ pnfx*(2)- 
n=0 


Determine the pf of S up to 525. Determine the mean and standard devi- 
ation of total payments per employee. 


The distribution up to amounts of 525 is given in Table 6.3. To obtain 
fs(x), each row of the matrix of convolutions of fx(zx) is multiplied by the 
probabilities from the row below the table and the products are summed. 

The reader may wish to verify using (6.6) that the first two moments of 
the distribution fs(x) are 


E(S) = 12.58, Var(S) = 58.7464. 
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Table 6.3 Aggregate probabilities for Example 6.6 


z fxs fx FR ca fx fe pe fx FE fsla) 
0 1 0 0 0 0 0 0 0 0 .05000 
1 0 .150 0 0 0 0 0 0 0 .01500 
2 0 -200 .02250 0 0 0 0 0 0 .02338 
3 0 -250 -06000 -00338 0 0 0 0 0 -03468 
4 0 .125 .11500 -01350 -00051 0 0 0 0 .03258 
5 0 .075 -13750 -03488 -00270 -00008 0 0 0 .03579 
6 0 .050 .13500 06144 -00878 -00051 -00001 0 0 -03981 
7 0 .050 .10750 .08569 -01999 -00198 -00009 -00000 0 04356 
8 0 .050 .08813 .09750 .03580 .00549 .00042 .00002 .00000 .04752 
9 0 .025 .07875 .09841 -05266 -01194 -00136 -00008 .00000 -04903 
10 0 .025 .07063 .09338 .06682 .02138 .00345 .00031 .00002 .05190 
11 0 0 -06250 -08813 -07597 .03282 .00726 -00091 -00007 05138 
42 0 0 -04500 -08370 -08068 04450 -01305 -00218 .00022 -05119 
13 0 0 .03125 .07673 -08266 -05486 -02062 00448 -00060 -05030 
14 0 0 01750 06689 -08278 -06314 -02930 -00808 00138 04818 
15 0 0 -01125 -05377 -08081 06934 -03826 01304 .00279 04576 
16 0 0 00750 -04125 .07584 .07361 04677 .01919 .00505 -04281 
17 0 0 .00500 -03052 .06811 07578 -05438 -02616 00829 -03938 
18 0 0 -00313 -02267 05854 .07552 -06080 -03352 01254 03575 
19 0 0 .00125 .01673 .04878 .07263 -06573 04083 -01768 -03197 
20 0 0 -00063 .01186 03977 06747 -06882 04775 02351 .02832 
21 0 0 0 .00800 .03187 .06079 .06982 .05389 .02977 -02479 
Pn .05 10 15 .20 -25 15 06 .03 01 


Hence the annual cost of the dental plan has mean 12.58 x 25 = 314.50 dollars 
and standard deviation 191.6155 dollars. (Why can’t the calculations be done 
from Table 6.3?) o 


It is common for insurance to be offered in which a deductible is applied to 
the aggregate losses for the period. When the losses occur to a policyholder 
it is called insurance coverage and when the losses occur to an insurance 
company it is called reinsurance coverage. The latter version is a common 
method for an insurance company to protect itself against an adverse year (as 
opposed to protecting against a single, very large claim). More formally, we 
present the following definition. 


Definition 6.7 Insurance on the aggregate losses, subject to a deductible, is 
called stop-loss insurance. The expected cost of this insurance is called the 
net stop-loss premium and can be computed as E|(S —d)4], where d is the 
deductible and the notation (-) means to use the value in parentheses if it is 
positive but to use zero otherwise. 


For any aggregate distribution, 


eS- da= f er 


| 
| 
i 


ee" 
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If the distribution is continuous, the net stop-loss premium can be computed 
directly from the definition as 


EKS —@)4] = I ” (a — d)fs(2) dz. 


Similarly, for discrete random variables, 


E(($ - 4)4] = $ - d)fs(2). 
zr>d 
` Any time there is an interval with no aggregate probability, the following 
result may simplify calculations. 


Theorem 6.8 Suppose Pr(a < S < b) =0. Then, fora <d <b, 


b-d d—a 
E[S- 4)+] = zza EIS — aj) + 32a HIG - b)+]. 
That is, the net stop-loss premium can be calculated via linear interpolation. 


Proof: From the assumption, Fs(x) = Fs(a) a <z < b. Then, 
BS- = f -Fede 
a d 
| [1 — Fs(x)| dz — J [1 — Fs(x)| dz 
i d ° . 
B(S- 0)+]- | U- Fo(@)lez 
= E[(S - a)+] — (d - a)l — Fs(a)]. (6.7) 


Then, by setting d = b in (6.7), 
E[(S — b)+] = E[(S — a)+] - (è - a) [1 — Fs()] 


II 


and therefore E((S — a)+] — EIS — b)+ l 


1 — Fs(a) = b—a 


Substituting this in (6.7) produces the desired result. Oo 


Further simplification is available in the discrete case, provided S places 
probability at equally spaced values. 


Theorem 6.9 Assume Pr(S = kh) = fr = 0 for some fined h > 0 and 
k=0.1.... and Pr(S =z) = 0 for all other z. Then, provided d = jh, with 


j a non-negative integer 


BS- ae) =h Do (— Fsi + sh}. 


m=0 
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Proof: 
E((S—d)4] = > (æ-d)fs(2) 
xz>d 
= J (bh — jh) fi 
k= 
co k~j-1 
= hy? 5 fr 
c=; m=0 


=o ep eae 
m=0 k=m+jt+1 


= hY {1- Fs|(m + jh}. 
m=0 g 


In the discrete case with probability at equally spaced values, a simple 
recursion holds. 


Corollary 6.10 Under the conditions of Theorem 6.9, 


E{[S — (G + Dh]+} = ELS — jh)+] — All — Fs Gh). 


This result is easy to use because, when d = 0, E[(S — 0)+] = E(S) = 
E(N)E(X), which can be obtained directly from the, frequency and severity 
distributions. 


Example 6.11 (Example 6.6 continued) The insurer is examining the effect 
of imposing an aggregate deductible per employee. Determine the reduction in 
the net premium as a result of imposing deductibles of 25, 30, 50, and 100 
dollars. 


From Table 6.3, the cdf at 0, 25, 50, and 75 dollars has values 0.05, 0.065, 
0.08838, and 0.12306. With E(S) = 25(12.58) = 314.5 we have 


E[(S —25),] = 314.5 — 25(1 — 0.05) = 290.75, 

E{($ —50)4] = 290.75 — 25(1 — 0.065) = 267.375, 

E[(S —75)4] = 267.375 — 25(1 — 0.08838) = 244.5845, 
E[(S—100),] = 244.5845 — 25(1 — 0.12306) = 222.661. 


From Theorem 6.8, E[(S — 30)+] = 38290.75 + 35267.375 = 286.07. When 
compared to the original premium of 314.5, the reductions are 23.75, 28.43, 
47.125, and 91.839 for the four deductibles. Oo 
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6.3.1 Exercises 


6.4 From (6.5), show that the relationships between the moments in (6.6) 
hold. 


6.5 When an individual is admitted to the hospital, the hospital charges have 
the following characteristics: 


1. Standard 
Charges Mean deviation 
Room 1,000 500 
Other 500 300 


2. The covariance between an individual’s room charges and other charges 
is 100,000. 


An insurer issues a policy that reimburses 100% for room charges and 
80% for other charges. The number of hospital admissions has a Poisson 
distribution with parameter 4. Determine the mean and standard deviation 
of the insurer’s payout for the policy. 


_ 6.6 Aggregate claims have been modeled by a compound negative binomial 

distribution with parameters r = 15 and ĝ = 5. The claim amounts are 
uniformly distributed on the interval (0,10). Using the normal approxima- 
tion, determine the premium such that the probability that claims will exceed 
premium is 0.05. 


6.7 Automobile drivers can be divided into three homogeneous classes. The 
number of claims for each driver follows a Poisson distribution with parameter 
A. Determine the variance of the number of claims for a randomly selected 
driver, using the following data. 


RR 


Proportion 


Class of population À 
1 0.25 5 
2 0.25 3 
3 0.50 2 


6.8 Assume X1, X2, and X3 are mutually independent loss random variables 
with probability functions as given in Table 6.4. Determine the pf of S = 
Xi t+ Xo + X3. 


6.9 Assume X), Xə, and X; are mutually independent random variables 
with probability functions as given in Table 6.5. If S = Xj + X2 + X3 and 
fs(5) = 0.06, determine p. 
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Table 6.4 Distributions for Exercise 6.8 


T f(z) - fo(z) fa(z) 
0 0.90 0.50 0.25 
1 0.10 0.30 0.25 
2 0.00 0.20 0.25 
3 0.00 0.00 0.25 
Table 6.5 Distributions for Exercise 6.9 
T fi(z) fo(x) fa(z) 
0 p 0.6 0.25 
1 1-—p 0.2 0.25 
2 0 0.1 0.25 
3 0 0.1 0.25 


6.10 Consider the following information about AIDS patients. 


1. The conditional distribution of an individual’s medical care costs, given 
that the individual does not have AIDS, has mean 1,000 and variance 
250,000. 


2. The conditional distribution of an individual’s medical care costs, given 
that the individual does have AIDS, has mean 70,000 and variance 
1,600,000. 


3. The number of individuals with AIDS in a group of m randomly selected 
adults has a binomial distribution with parameters m and q = 0.01. 


An insurance company determines premiums for a group as the mean 
plus10% of the standard deviation of the group’s aggregate claims distrib- 
ution. The premium for a group of 10 independent lives for which all individ- 
uals have been proven not to have AIDS is P. The premium for a group of 
10 randomly selected adults is Q. Determine P/Q. 


6.11 You have been asked by a city planner to analyze office cigarette smoking 
patterns. The planner has provided the information in Table 6.6 about the 
distribution of the number of cigarettes smoked during a workday. 

The number of male employees in a randomly selected office of N employ- 
ees has a binomial distribution with parameters N and 0.4. Determine the 
mean plus the standard deviation of the number of cigarettes smoked during 
a workday in a randomly selected office of eight employees. 


6.12 For a certain group, aggregate claims are uniformly distributed over 
(0,10). Insurer A proposes stop-loss coverage with a deductible of 6 for a 
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Table 6.6 Data for Exercise 6.11 


Male Female 
Mean 6 3 
Variance 64 31 


premium equal to the expected stop-loss claims. Insurer B proposes group 
coverage with a premium of 7 and a dividend (a premium refund) equal to 
the excess, if any, of 7k over claims. Calculate k such that the expected cost 
to the group is equal under both proposals. 


6.13 For a group health contract, aggregate claims are assumed to have an 
exponential distribution where the mean of the distribution is estimated by the 
group underwriter. Aggregate stop-loss insurance for total claims in excess of 
125% of the expected claims is provided by a gross premium that is twice the 
expected stop-loss claims. You have discovered an error in the underwriter’s 
method of calculating expected claims. The underwriter’s estimate is 90% of 
the correct estimate. Determine the actual percentage loading in the premium. 


6.14 A random loss, X, has the following probability function. You are given 
that E(X) = 4 and E[(X — d)+] = 2. Determine d. 


F(z) 


0.05 
0.06 
0.25 
0.22 
0.10 
0.05 
0.05 
0.05 
0.05 
0.12 


WOonrnoanfrwnro|s 


6.15 A reinsurer pays aggregate claim amounts in excess of d, and in return 
it receives a stop-loss premium E[(S—d)4]. You are given E[(S—100)+] = 15, 
E|(S —120)4] = 10, and the probability that the aggregate claim amounts are 
greater than 80 and less than or equal to 120 is 0. Determine the probability 
that the aggregate claim amounts are less than or equal to 80. 


peaneertncemeryra mec ceme nnen tment en neem a 
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6.16 A loss random variable X has pdf f(x) = 345, 0 < x < 100. Two 
policies can be purchased to alleviate the financial impact of the loss. 


a 0, xz < 50k, 
=) 250, «> 50k, 
k 
and 
B=kz, 0< x< 100, 
where A and B are the amounts paid when the loss is x. Both policies have 
the same net premium, that is, E(A) = E(B). Determine k. 


6.17 For a nursing home insurance policy, you are given that the average 
length of stay is 440 days and 30% of the stays are terminated in the first 
30 days. These terminations are distributed uniformly during that period. 
The policy pays 20 per day for the first 30 days and 100 per day thereafter. 
Determine the expected benefits payable for a single stay. 


6.18 An insurance portfolio produces N claims, where 


wrols 
oO 
on 


Individual claim amounts have the following distribution: 


z fx(z) 
1 0.9 


10 0.1 


Individual claim amounts and N are mutually independent. Calculate the 


probability that the ratio of aggregate claims to expected claims will exceed 
3.0. 


6.19 A company sells group travel-accident life insurance with m payable 
in the event of a covered individual’s death in a travel accident. The gross 
premium for a group is set equal to the expected value plus the standard 
deviation of the group’s aggregate claims. The standard premium is based on 
the following assumptions: 


1. All individual claims within the group are mutually independent and 
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2. m?q(1—q) = 2,500, where q is the probability of death by travel accident 
for an individual. 


In a certain group of 100 lives, the independence assumption fails because . 


three specific individuals always travel together. If one dies in an accident, 
all three are assumed to die. Determine the difference between this group’s 
premium and the standard premium. 


6.20 A life insurance company covers 16,000 lives for one-year term life in- 
surance, as shown below. 


Benefit Number Probability of 
amount covered claim 
1 8,000 0.025 
2 3,500 0.025 
4 4,500 0.025 


All claims are mutually independent. The insurance company’s retention 
limit is 2 units per life. Reinsurance is purchased for 0.03 per unit. The 
probability that the insurance company’s retained claims, S, plus cost of rein- 


S—E(S) 


J Var(S) 


surance will exceed 1,000 is Pr > K . Determine K using a normal 
approximation. 
6.21 The probability density function of individual losses Y is 
y 
.02 {1 — -=— }, 0< y< 100, 
w= 0.02 (1-55) ý 


0, elsewhere. 


The amount paid, Z, is 80% of that portion of the loss that exceeds a de- 
ductible of 10. Determine E(Z). 


6.22 An individual loss distribution is normal with = 100 and o? = 9. The 
distribution for the number of claims, N, is given in Table 6.7. Determine the 
probability that aggregate claims exceed 100. 


Table 6.7 Distribution for Exercise 6.22 


n Pr(N =n) 
0 0.5 
is 0.2 
2 0.2 
3 0.1 
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Table 6.8 Distribution for Exercise 6.23 


Fn) 
1/16 
1/4 
3/8 
1/4 
1/16 


Pewwrv=noljsz 


6.23 An employer self-insures a life insurance program with the following 
characteristics: 


1. Given that a claim has occurred, the claim amount is 2,000 with prob- 
ability 0.4 and 3,000 with probability 0.6, 


2. The number of claims has the distribution given in Table 6.8. 


The employer purchases aggregate stop-loss coverage that limits the em- 
ployer’s annual claims cost to 5,000. The aggregate stop-loss coverage costs 
1,472. Determine the employer’s expected annual cost of the program, includ- 
ing the cost of stop-loss coverage. 


6.24 The probability that an individual admitted to the hospital will stay k 
days or less is 1—0.8* for k = 0,1,2,.... A hospital indemnity policy provides 
a fixed amount per day for the 4th day through the 10th day (that is, for a 
maximum of 7 days). Determine the percentage increase in the expected cost 
per admission if the maximum number of days paid is increased from 7 to 14. 


6.25 The probability density function of aggregate claims, S, is given by 
fs(x) = 3174, x > 1. The relative loading @ and the value À are selected so 
that 


Pr[S < (1 + 6)E(S)] = Pr [S < B(S) + AVVar(S)] = 0.90. 
Calculate A and 8. 


X 6.4 ANALYTIC RESULTS 


For most choices of distributions of N and the X;s, the compound distribu- 
tional values can only be obtained numerically. Subsequent sections in this 
chapter are devoted to such numerical procedures. 

However, for certain combinations of choices, simple analytic results are 
available, thus reducing the computational problems considerably. 
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Example 6.12 (Compound geometric-exponential) Suppose Xi, X2,... are 
iid. with common exponential distribution with mean 0 and mgf M. aoe 
(1—0z)~!. Suppose that N has a geometric distribution with parameter B and 


pgf Pn(z) = [1—B(z—-1)]7? (see Appendix B). Determine the distribution of — 


S. 
The mef of S is 
Ms(z) = Py[Mx(z)l 
= {1-Al(1— 62)" - 1} 
ap eal pag SAE ign ‘ 
E TE 1+77g! 6(1 + B)z]~* 


with a bit of algebra. 

This is a two-point mixture of a degenerate distribution with probability 1 
at zero and an exponential distribution with mean @(1+ 8). Hence, Pr(S = 
0) = (14+ 8)7}, and for x > 0, S has pdf 


fs(x) = aR a|- aa 


It has a point mass of (1+)~! at zero and an exponentially decaying density 
over the positive axis. Its cdf can be written as 


B T 
e-a) BR 


It has a jump at zero and is continuous otherwise. This example will arise 
again in Chapter 8 in connection with ruin theory. oO 


F(z) =1—- 


Example 6.13 (Exponential severities) Determine the cdf of S for any com- 
pound distribution with exponential severities. 


The mgf of the sum of n independent exponential random variables each 
with mean ô is 


Mxi+Xo4+--4+Xn (z) a (1 pi 02)”, 
which is the mgf of the gamma distribution with cdf 


FẸ (x) =T (n; 5) 


(see Appendix A). For integer values of a, the values of T (a; x) can be calcu- 
lated exactly (see Appendix A for the derivation) as 


n—-1 jot 
P(nj2) =1- JF er eee (6.8) 


j=0 
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From (6.3) 


Fs(x) = po + ae (n; =} 


n=l 


Substituting in (6.8) yields 


gji —2/O 
Fale) =1- op Lo , £>0. (6.9) 
=0 
Interchanging the order of summation yields 
zg z/0 = 
Pa) = 1-e a ERE S p 
A n=j+1 
— 1e? in 
= le iP, a z>0, 
j=0 
where P; = jy Pn for j = 0,1,.... o 


The approach of Example 6.13 may be extended to the larger class of mixed 
Erlang severity distributions, as shown in Exercise 6.35. 

For frequency distributions which assign positive probability to all nonneg- 
ative integers, (6.9) can be evaluated by taking sufficient terms in the first 
summation. For distributions for which Pr(N > n*) = 0, the first summation 
becomes finite. For example, for the binomial frequency distribution, (6.9) 
becomes 


Fe(2)=1-)> (™)ara gy" > ` (x/ oie (6.10) 


n=1 


Example 6.14 (Compound negative binomial-exponential) Determine the 
distribution of S when the frequency distribution is negative binomial with an 
integer value for the parameter r and the severity distribution is exponential. 


The mef of S is 


Ms(z) = Py[Mx(z)] 
= Py{(1—6z)"| 
= {1—p[(1—6z)~* -= I}. 


With a bit of algebra, this can be rewritten as 


us(e)= (14 Pata -00 +07- 9) 
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which is of the form 
Ms(z) = P [M3 (z)] 


where 


Pya = [1+ 256-0] 


the pgf of the binomial distribution with parameters r and B/(1+ 8), and 
M%(z) is the mgf of the exponential distribution with mean 6(1 + 8). 

This transformation reduces the computation of the distribution function 
to the finite sum of the form of (6.10), that is, 


re) = 1-3 (a) (ra) (Gea) 


n-1l p p-1 -1)j ,-2074(1+- 8)? 
ay (1+8) Pe (1+2) i 

E J: 

j=0 o 
Example 6.15 (Severity distributions closed under convolution) A distribu- 
tion is said to be closed under convolution if adding i.i.d. members of a 
_ family produces another member of that family. Further assume that adding n 
members of a family produces a member with all but one parameter unchanged 
and the remaining parameter is multiplied by n. Determine the distribution 
of S when the severity distribution has this property. 


The condition means that, if fx(x;a) is the pf of each Xj, then the pf of 
Xı + X2 +--+ Xn is fx(z;na). This means that 


fs(z) 


S pn fx (232) 
n=1 


> Pn fx (ana), 


n=1 


eliminating the need to carry out evaluation of the convolution. Severity 
distributions that are closed under convolution include the gamma and inverse 
Gaussian distributions. See Exercise 6.26. Oo 


6.4.1 Exercises 


6.26 The following questions concern closure under convolution. 


(a) Show that the gamma and inverse Gaussian distributions are closed 
under convolution. Show that the gamma distribution has the 
additional property mentioned in Example 6.15. 
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(b) Discrete distributions can also be used as severity distributions. 
Which of the distributions in Appendix B are closed under convo- 
lution? How can this information be used in simplifying calculation 
of compound probabilities of the form (4.20)? 


6.27 A compound negative binomial distribution has parameters 6 = 1, 
r= 2, and severity distribution {fx(z); z = 0,1,2,.. .}. How do the pa- 
rameters of the distribution change if the severity distribution is {gx (2) = 
fx(a)/[1-fx(0)]; z =1,2,...} but the aggregate claims distribution remains 
unchanged? 


6.28 Consider the compound logarithmic distribution with exponential sever- 
ity distribution. 
(a) Show that the density of aggregate losses may be expressed as 


co 


i 1 1 B V naie 
fso) = pari Lal lanl tad 


n= 


(b) Reduce this to 


_ exp{—0/[6(1 + B)]} - exp 2/0) 
zln(1 + £) 


fs(z) 


6.29 An insurance policy reimburses aggregate incurred expenses at the rate 
of 80% of the first 1,000 in excess of 100, 90% of the next 1,000, and 100% 
thereafter. Express the expected cost of this coverage in terms of Rg = El(S— 
d).] for different values of d. 


6.30 The number of accidents incurred by an insured driver in a single year 
has a Poisson distribution with parameter A = 2. If an accident occurs, 
the probability that the damage amount exceeds the deductible is 0.25. The 
number of claims and the damage amounts are independent. What is the 
probability that there will be no damages exceeding the deductible in a single 
year? 


6.31 The aggregate loss distribution is modeled by an insurance company 
using an exponential distribution. However, the mean is uncertain. The 
company uses a uniform distribution (2,000,000, 4,000,000) to express its 
view of what the mean should be. Determine the expected aggregate losses. 


6.32 A group hospital indemnity policy provides benefits at a continuous rate 
of 100 per day of hospital confinement for a maximum of 30 days. Benefits for 
partial days of confinement are prorated. The length of hospital confinement 
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in days, T', has the following continuance (survival) function for 0 < ¢ < 30: 
1 — 0.04¢, 0<t<10, 
Pr(T >t)= < 0.95-0.035t, 10 < t< 20, 
0.65 — 0.02ż£, 20<t< 30. 
For a policy period, each member’s probability of a single hospital admission 
is 0.1 and of more than one admission is 0. Determine the pure premium per 


member, ignoring the time value of money. 


6.33 Medical and dental claims are assumed to be independent with com- 
pound Poisson distributions as follows: 


Claim type Claim amount distribution 


Medical claims Uniform (0, 1,000) 
Dental claims Uniform (0, 200) 


wrvrvi»x 


Let X be the amount of a given claim under a policy which covers both 
- medical and dental claims. Determine E[(X — 100),], the expected cost (in 
excess of 100) of any given claim. 


6.34 For a certain insured, the distribution of aggregate claims is binomial 
with parameters m = 12 and q = 0.25. The insurer will pay a dividend, 
D, equal to the excess of 80% of the premium over claims, if positive. The 
premium is 5. Determine E[D]. 


6.35 Consider a severity distribution which is a finite mixture of gamma 
distributions with integer shape parameters (such gamma distributions are 
called Erlang distributions), that is, which may be expressed as 


E 8- = —1p-2/0 


fx(z) = 2 Ea ee, 


(a) Show that the moment generating function may be written as 


Mx(z) = Q{(1 — 62z)~*}, 


where 
Q(z) = 212" 


is the pgf of the distribution {q,,q2,...,q-}. Thus interpret fx(z) 
as the pf of a compound distribution. 
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(b) Show that the mgf of S is 
Ms(z) = C{(1 — 6z)“"}, 


where 


C(z) = see = Py{Q(z)}- 
k=0 


(c) Describe how the distribution {c,; k = 0,1,2,...} may be calcu- 
lated recursively if the number of claims distribution is a member 
of the (a,b, 1) class (Section 4.6.6). 


(d) Show that the distribution function of S is given by 
Ha Sewn 


"i Len Soo elt > 220, 


Fs(z) 


li 


where Cj = Jo n=j+1 Cn- 


X 6.5 COMPUTING THE AGGREGATE CLAIMS DISTRIBUTION 


The computation of the compound distribution function 


F(z) = Y paFH (2) ` (6.11) 


n=0 


or the corresponding probability (density) function is generally not an easy 
task, even in the simplest of cases. In this section we discuss a number of 
approaches to numerical evaluation of (6.11) for specific choices of the fre- 
quency and severity distributions as well as for arbitrary choices of one or 
both distributions. 

One approach is to use an approximating distribution to avoid direct 
calculation of (6.11). This approach was used in Example 6.5 where the 
method of moments was used to estimate the parameters of the approximating 
distribution. The advantage of this method is that it is simple and easy to 
apply. However, the disadvantages are significant. First, there is no way of 
knowing how good the approximation is. Choosing different approximating 
distributions can result in very different results, particularly in the right- 
hand tail of the distribution. Of course, the approximation should improve 
as more moments are used; but after four moments, one quickly runs out of 
distributions! 

The approximating distribution may also fail to accommodate special fea- 
tures of the true distribution. For example, when the loss distribution is of 
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the continuous type and there is a maximum possible claim (for example, 
when there is a policy limit), the severity distribution may have a point mass 
(“atom” or “spike”) at the maximum. The true aggregate claims distribution 
is of the mixed type with spikes at integral multiples of the maximum corre- 
sponding to 1,2,3,... claims at the maximum. These spikes, if large, can have 
a significant effect on the probabilities near such multiples. These jumps in 
the aggregate claims distribution function cannot be replicated by a smooth 
approximating distribution. 

The second method to evaluate (6.11) or the corresponding pdf is direct 
calculation. The most difficult (or computer intensive) part is the evalua- 
tion of the n-fold convolutions of the severity distribution for n = 2,3,4,.... 
In some situations, there is an analytic form—for example, when the sever- 
ity distribution is closed under convolution, as defined in Example 6.15 and 
illustrated in Examples 6.12-6.14. Otherwise the convolutions need to be 
evaluated numerically using 


F(a) = J * Fg — y) dF x(y). (6.12) 


When the losses are limited to nonnegative values (as is usually the case), the 
range of integration becomes finite, reducing (6.12) to 


FEE = [BEM @-aFx(o). (6.13) 


These integrals are written in Lebesgue-Stieltjes form because of possible 
jumps in the cdf F(x) at zero and at other points. Numerical evaluation 
of (6.13) requires numerical integration methods. Because of the first term 
inside the integral, (6.13) needs to be evaluated for all possible values of x. 
This quickly becomes technically overpowering! 

A simple way to avoid these technical problems is to replace the severity 
distribution by a discrete distribution defined at multiples 0,1,2... of some 
convenient monetary unit such as 1,000. This reduces (6.13) to (in terms of 
the new monetary unit) 


F(z) = J FX E — v) fx). 
y=0 
The corresponding pf is 


Ea) = of ww) Fx). 


y=0 


1Without going into the formal definition of the Lebesgue-Stieltjes integral, it suffices to 
interpret f g(y) dF'x(y) as to be evaluated by integrating g(y) fx (y) over those y values for 
which X has a continuous distribution and then adding g(yi) Pr(X = yi) over those points 
where Pr(X = y;) > 0. This allows for a single notation to be used for continuous, discrete, 
and mixed random variables. 
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In practice, the monetary unit can be made sufficiently small to accom- 
modate spikes at maximum insurance amounts. One needs only the spike to 
be a multiple of the monetary unit to have it located at exactly the right 
point. As the monetary unit of measurement becomes small, the discrete 
distribution function needs to approach the true distribution function. The 
simplest approach is to round all amounts to the nearest multiple of the mon- 
etary unit; for example, round all losses or claims to the nearest 1,000. More 
sophisticated methods will be discussed later in this chapter. 

When the severity distribution is defined on nonnegative integers 0, 1, 2,..., 
calculating f%*(a) for integral x requires x + 1 multiplications. Then carrying 
out these calculations for all possible values of k and x up to n requires a 
number of multiplications that are of order n°, written as O(n*), to obtain 
the distribution of (6.11) for s = 0 to x =n. When the maximum value, n, 
for which the aggregate claims distribution is calculated is large, the number 
of computations quickly becomes prohibitive, even for fast computers. For 
example, in real applications n can easily be as large as 1,000. This requires 
about 10° multiplications. Further, if Pr(X = 0) > 0, an infinite number 
of calculations are required to obtain any single probability. This is because 
F% (x) > 0 for all n and all x and so the sum in (6.11) contains an infinite 
number of terms. When Pr(X = 0) = 0, we have Fx" (x) = 0 for n > x and 
so (6.11) will have no more than z + 1 positive terms. Table 6.3 provides an 
example of this latter case. 

Alternative methods to more quickly evaluate the aggregate claims distri- 
bution are discussed in the next two sections. The first such method, the 
recursive method, reduces the number of computations discussed above 
to O(n), which is a considerable savings in computer time, a reduction of 
about 99.9% when n = 1,000 compared to direct calculation. However, the 
method is limited to certain frequency distributions. Fortunately, it includes 
all frequency distributions discussed in Section 4.6 and Appendix B. 

The second such method, the inversion method, numerically inverts a 
transform, such as the characteristic function, using a general or specialized 
inversion software package. Two versions of this method are discussed in this 
chapter. 


(6.6 THE RECURSIVE METHOD 


Suppose that the severity distribution fx (x) is defined on 0, 1, 2, . . . , m repre- 
senting multiples of some convenient monetary unit. The number m represents 
the largest possible payment and could be infinite. Further, suppose that the 
frequency distribution, pp, is a member of the (a,,1) class and therefore 
satisfies b 

m= (a+ $) Pent k =2,3,4,.... 
Then the following result holds. 
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Theorem 6.16 For the (a,b, 1) class, 


[pı — (a + b)pol fx (x) + IAT (a + by/x) fx (y) fs(2 — y) 
1— afx(0) d 


noting that x Am is notation for min(x,m). 


fs(x) = (6.14) 


Proof: This result is identical to Theorem 4.49 with appropriate substitution 
of notation and recognition that the argument of fx(x) cannot exceed m. O 


Corollary 6.17 For the (a,b,0) class, the result (6.14) reduces to 


Drar (a+ by/x) fx(y)fs(x — y) 
1— afx (0) 
Note that when the severity distribution has no probability at zero, the 


denominator of (6.14) and (6.15) equals 1. Further, in the case of the Poisson 
distribution, (6.15) reduces to 


fs(x) = (6.15) 


TAM 


fsla) == ufxtu)fsle—y), s=12 (61) 


y=i 


The starting value of the recursive schemes (6.14) and (6.15) is fs(0) = 
Py[fx(0)] following Theorem 4.51 with an appropriate change of notation. 
In the case of the Poisson distribution we have 


POSEN Am, 


Starting values for other frequency distributions are found in Appendix D. 


6.6.1 Applications to compound frequency models 


When the frequency distribution can be represented as a compound distribu- 
tion (e.g., Neyman Type A, Poisson—inverse Gaussian) involving only distri- 
butions from the (a, b,0) or (a,b,1) classes, the recursive formula (6.14) can 
be used two or more times to obtain the aggregate claims distribution. If the 
frequency distribution can be written as 


Py(z) = P,[Po(z)], 


then the aggregate claims distribution has pgf 


Ps(z) = Py[Px(z)] 
. = P{P.[Px(z)]}, 
which can be rewritten as 
Ps(z) = Pi[Ps, (z)] (6.17) 
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where 


Ps, (2) = Pa[Px(2)]. | (6.18) 


Now (6.18) is the same form as an aggregate claims distribution. Thus, if 

Po(z) is in the (a, b, 0) or (a, b, 1) class, the distribution of S, can be calculated 

using (6.14). The resulting distribution is the “severity” distribution in (6.18). 

Thus, a second application of (6.14) to (6.17) results in the distribution of S. 
The following example illustrates the use of this algorithm. 


Example 6.18 The number of claims has a Poisson—ETNB distribution with 
Poisson parameter \ = 2 and ETNB parameters p = 3 andr = 0.2. The 
claim size distribution has probabilities 0.3, 0.5, and 0.2 at 0, 10, and 20, 
respectively. Determine the total claims distribution recursively. 


In the above terminology, N has pgf Py(z) = Pı [P2(z)|, where P,(z) and 
P2(z) are the Poisson and ETNB pgfs, respectively. Then the total dollars of 
claims has pgf Ps(z) = P) [Ps, (z)], where Ps, (z) = Pz [Px(z)] is a compound 
ETNB pgf. We will first compute the distribution of S;. We have (in monetary 
units of 10) fx(0) = 0.3, fx(1) = 0.5, and fx(2) = 0.2. In order to use the 
compound ETNB recursion, we start with 


fs,(0) = Po[fx(0)] 
{1+ 6 [1 — fx(0)}}"" -G+6)" 


1— (1+8) 
{1 +3(1 — 0.3)}~°? — (1 +3)? 
S Gaa 


= 0.16369. 


The remaining values of fs, (x) may be obtained from (6.14) with S replaced 
by Sı. In this case we have a = 3/(1+3) = 0.75, b = (0.2—1)a = —0.6, po = 0 
and pı = (0.2)(3)/ [(1 + 3)°?*1 — (1 + 3)] = 0.46947. Then (6.14) becomes 


[0.46947 — (0.75 — 0.6)(0)] fx (x) 
+ yar (0.75 — 0.6y/zx) fx (y) fs, (£ — y) 
fs, (2) 1 — (0.75)(0.3) 


0.60577 fx (x) + 1.29032 Y` (0.75 = 0.62) fx(y) fs, (2—4). 


y=1 
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The first few probabilities are 


fs,(1) = 0.60577(0.5) + 1.29032 [0.75 — 0.6 (+)| (0.5)(0.16369) 
= 0.31873, 


fs,(2) = 0.60577(0.2) + 1.29032 { [0.75 — 0.6 (4)] (0.5) (0.31873) 
+ [0.75 — 0.6 (2)] (0.2)(0.16369) } = 0.22002, 

fs,(3) = 1.29032 { [0.75 — 0.6 (4)] (0.5) (0.22002) 
+ [0.75 — 0.6 (2)| (0.2)(0.31873) } = 0.10686, 

fs,(4) = 1.29032 { [0.75 — 0.6 (4)] (0.5)(0.10686) 


+ [0.75 — 0.6 (3)] (0.2)(0.22002) } = 0.06692. 


We now turn to evaluation of the distribution of S with compound Poisson 
pgf 
Ps(z) = Pi [Ps,(z)] = e Ps0- 


Thus the distribution 
{fs (2); £ = 0,1,2,...} 


becomes the “secondary” or “claim size” distribution in an application of the 
compound Poisson recursive formula. Therefore, 


fs (0) = Ps(0) = eò[Ps: (0)-1] = elf: (0)-1] = e2(0-16369—1) = 0.18775. 
The remaining probabilities may be found from the recursive formula 


fs(x) = 2 wala \fs(z—y), v=1,2,.... 


The first few probabilities are 


fs(1) = 2(}) (0.31873)(0.18775) = 0.11968, 
fs(2) = 2(4) (0.81873)(0.11968) + 2 (2) (0.22002) (0.18775) = 0.12076, 
fs(3) = 2(%) (0.31873)(0.12076) + 2 (2) (0.22002) (0.11968) 
+2 (2) (0.10686) (0.18775) = 0.10090, 
fs(4) = 2 (4) (0.31873)(0.10090) + 2 (4) (0.22002) (0.12076) 


+2 (3) (0.10686) (0.11968) + 2 (4) (0.06692) (0.18775) 
Oo 
This simple idea can be extended to higher levels of compounding by re- 


peatedly applying the same concepts. The computer time required to carry 
out two applications will be about twice that of one application of (6.14). 
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However, the total number of computations is still of order O(x?) rather than 
O(2°) as in the direct method. 

When the severity distribution has a maximum possible value at m, the 
computations are speeded up even more because the sum in (6.14) will be 
restricted to at most m nonzero terms. In this case, then, the computations 
can be considered to be of order O(z). 


6.6.2 Underflow/overflow problems 


The recursion (6.14) starts with the calculated value of P(S = 0) = Py[fx (0)]. 
For large insurance portfolios, this probability is very small, sometimes smaller 
than the smallest number that can be represented on the computer. When 
this occurs, this initial value is represented on the computer as zero and the 
recursion (6.14) fails. This problem can be overcome in several different ways 
(see Panjer and Willmot [105]). One of the easiest ways is to start with an ar- 
bitrary set of values for fs(0), fs(1),..-, fs (k) such as (0,0,0,...,0,1), where 
k is sufficiently far to the left in the distribution so that Fs (k) is still negli- 
gible. Setting k to a point that lies six standard deviations to the left of the 
mean is usually sufficient. Recursion (6.14) is used to generate values of the 
distribution with this set of starting values until the values are consistently 
less than fs(k). The “probabilities” are then summed and divided by the 
sum so that the “true” probabilities add to 1. Trial and error will dictate how 
small k should be for a particular problem. 

Another method to obtain probabilities when the starting value is too small 
is to carry out the calculations for a subset of the portfolio. For example, for 
the Poisson distribution with mean A, find a value of A* = A/2” so that the 
probability at zero is representable on the computer when A* is used as the 
Poisson mean. Equation (6.14) is now used to obtain the aggregate claims 
distribution when * is used as the Poisson mean. If P,(z) is the pgf of the 
aggregate claims using Poisson mean \*, then Ps(z) = [Ps (z)|° 2" Hence 
one can obtain successively the distributions with pgfs [P.(z)|?, [P.(z)]*, 
[P.(z)]§,...,[P.(z)]?” by convoluting the result at each stage with itself. This 
eias an i sdditionel n convolutions in carrying out the calculations but in- 
volves no approximations. This procedure can be carried out for any frequency 
distributions that are closed under convolution. For the negative binomial dis- 
tribution, the analogous procedure starts with r* = r/2”. For the binomial 
distribution, the parameter m must be integer valued. A slight modification 
can be used. Let m* = |m/2”| when |-| indicates the integer part of function. 
When the n convolutions are carried out, one still needs to carry out the cal- 
culations using (6.14) for parameter m—m*2”. This result is then convoluted 
with the result of the n convolutions. For compound frequency distributions, 
only the primary distribution needs to be closed under convolution. 
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6.6.3 Numerical stability 


Any recursive formula requires accurate computation of values because each 
such value will be used in computing subsequent values. Recursive schemes 
suffer the risk of errors propagating through all subsequent values and po- 
tentially blowing up. In the recursive formula (6.14), errors are introduced 
through rounding or truncation at each stage because computers represent 
numbers with a finite number of significant digits. The question about stabil- 
ity is, “How fast do the errors in the calculations grow as the computed values 
are used in successive computations?” 

The question of error propagation in recursive formulas has been a sub- 
ject of study of numerical analysts. This work has been extended by Panjer 
and Wang [104] to study the recursive formula (6.14). The analysis is quite 
complicated and well beyond the scope of this book. However, some general 
conclusions can be made here. 

Errors are introduced in subsequent values through the summation 


S (2+ 2) fots- 


y=1 


- in recursion (6.14). In the extreme right-hand tail of the distribution of S, 
this sum is positive (or at least nonnegative), and subsequent values of the 
sum will be decreasing. The sum will stay positive, even with rounding errors, 
when each of the three factors in each term in the sum is positive. In this case, 
the recursive formula is stable, producing relative errors that do not grow fast. 
For the Poisson and negative binomial based distributions, the factors in each 
term are always positive. 

On the other hand, for the binomial distribution, the sum can have negative 
terms because a is negative, b is positive, and y/z is a positive function not 
exceeding 1. In this case, the negative terms can cause the successive values to 
blow up with alternating signs. When this occurs, the nonsensical results are 
immediately obvious. Although this does not happen frequently in practice, 
the reader should be aware of this possibility in models based on the binomial 
distribution. 


6.6.4 Continuous severity 


The recursive method has been developed for discrete severity distributions, 
while it is customary to use continuous distributions for severity. In the case of 
continuous severities, the analog of the recursion (6.14) is an integral equation, 
the solution of which is the aggregate claims distribution. 


Theorem 6.19 For the (a,b,1) class of frequency distributions and any con- 
tinuous severity distribution with probability on the positive real line, the fol- 
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lowing integral equation holds: 


fs(x) = pifx(z) + i (e + #) fx(y)fs(z — y) dy. (6.19) 


The proof of this result is beyond the scope of this book. For a detailed 
proof, see Theorems 6.14.1 and 6.16.1 of Panjer and Willmot [106], along with 
the associated corollaries. They consider the more general (a,b, m) class of 
distributions, which allow for arbitrary modification of m initial values of the 
distribution. Note that the initial term is pı fx (x), not [pı — (a + b)po] fx (x) 
as in (6.14). Also, (6.19) holds for members of the (a,b, 0) class as well. 

Integral equations of the form (6.19) are Volterra integral equations of the 
second kind. Numerical solution of this type of integral equation has been 
studied in the text by Baker [10]. We will develop a method using a discrete 
approximation of the severity distribution in order to use the recursive method 
(6.14) and avoid the more complicated methods of Baker [10]. 


6.6.5 Constructing arithmetic distributions 


In order to implement recursive methods, the easiest approach is to construct a 
discrete severity distribution on multiples of a convenient unit of measurement 
h, the span. Such a distribution is called arithmetic because it is defined on 
the nonnegative integers. In order to arithmetize a distribution, it is important 
to preserve the properties of the original distribution both locally through the 
range of the distribution and globally—that is, for the entire distribution. 
This should preserve the general shape of the distribution and at the same 
time preserve global quantities such as moments. 

The methods suggested here apply to the discretization (arithmetization) 
of continuous, mixed, and nonarithmetic discrete distributions. 


6.6.5.1 Method of rounding (mass dispersal) Let f; denote the probability 
placed at jh, 7 =0,1,2,.... Then set? 


h h 
fo = Pr(x <$) =Fx (3-0), 
f = Pr (jh 3 <x <sh+5) 


Fx (sn+ 5-0) — Fx (in-$-0), j=1,2,.... 


This method splits the probability between (j + 1)h and jh and assigns it 
to 7 +1 and j. This, in effect, rounds all amounts to the nearest convenient 
monetary unit, h, the span of the distribution. 


?The notation F. ‘x(x — 0) indicates that discrete probability at x should not be included. 
For continuous distributions this will make no difference. 
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where mÈ is given by (6.22). Hence, the solution (6.22) preserves the first p 


6.6.5.2 Method of local moment matching In this method we construct an 
moments, as required. C 


arithmetic distribution that matches p moments of the arithmetic and the true 
severity distributions. Consider an arbitrary interval of length ph, denoted 
by (xx, 2% + ph). We will locate point masses mi, m#,--- ,mẸ at points £p, 
Tk +h,- £u + ph so that the first p moments are preserved. The system of 
p+ 1 equations reflecting these conditions is 


Example 6.21 Suppose X has the exponential distribution with pdf f(x) = 
0.1e~°!*. Use a span of h = 2 to discretize this distribution by the method of 
rounding and by matching the first moment. 


zp-+ph—0 For the method of rounding, the general formulas are 
(wy + jh)'mé = [ dee), PSO Tap) . (620) 
> -0 fo = F(1)= 1 -— e7®™1) = 0.09516, 


fi F(2j +1) — F(2j — 1) = e7%1CI-D — e70-124+1), 
The first few values are given in Table 6.9. 


For matching the first moment we have p = 1 and z, = 2k. The key 
equations become 


` where the notation “—0” at the limits of the integral indicates that discrete 
probability at x, is to be included but discrete probability at x; + ph is to be 
excluded. 

Arrange the intervals so that 2,41 = 2, +ph and so the endpoints coincide. 
Then the point masses at the endpoints are added together. With xp = 0, the 


resulting discrete distribution has successive probabilities: mi 7 F r—2k—2 R L je-O102442) — go 0.108), 
fo= mg, fi=mi,  fo=mi§,. (6.21) 2k =2 
fp=m+m), fp mih fp = mb, a p . [F r-k e Se eee 
n= n 5 (0.1)e dz = —6e + 5e , 
By summing (6.20) for all possible values of k, with zo = 0, it is clear 2k = 
that the first p moments are preserved for the entire distribution and that the and then 


probabilities add to 1 exactly. It only remains to solve the system of equations 
(6.20). 


fo = mo =5e~°? — 4 = 0.09365, 


> pil Í — 5e-0-1(25-2) _ 10 ~0.1(27) —0.1(2j+2) 
Theorem 6.20 The solution of (6.20) is fs a Ta tog 
ietpi=ð See i The first few values also are given in Table 6.9. A more direct solution for 
H = J Uoo 2 = ——.—dFx(r), j=0,1,...,p. (6.22) matching the first moment is provided in Exercise 6.36. o 
zk—0 iŻj 


This method of local moment matching was introduced by Gerber and 
Jones [44] and Gerber[43] and further studied by Panjer and Lutek [103] for 
a variety of empirical and analytical severity distributions. In assessing the 
impact of errors on aggregate stop-loss net premiums (aggregate excess-of-loss 
pure premiums), Panjer and Lutek [103] found that two moments were usually 
sufficient and that adding a third moment requirement adds only marginally 
to the accuracy. Furthermore, the rounding method and the first-moment 
method (p = 1) had similar errors while the second-moment method (p = 2) 
provided significant improvement. The specific formulas for the method of 
rounding and the method of matching the first moment are given in Appen- 
dix E. A reason to favor matching zero or one moment is that the resulting 
probabilities will always be nonnegative. When matching two or more mo- 
ments, this cannot be guaranteed. 

The methods described here are qualitatively similar to numerical methods 
used to solve Volterra integral equations such as (6.19) developed in numerical 
analysis (see, for example, Baker [10]). 


Proof: The Lagrange formula for collocation of a polynomial f(y) at points 
YO: Y1; -< Yn 1S 


fly) = to) [J] —. 


5=0 yji ti 


Applying this formula to the polynomial f(y) = y” over the points Tk, Tk + 
h,..., £k + ph yields 


z z — Tzk — th 
= + ih)” ———— =0,1,...,p. 
2 + jh) I (j—i)h > T 0, ’ 2D 


Integrating over the interval [£k, £k + ph) with respect to the severity distri- 
bution results in 


zk+ph—-0 P 
J wv dFx(£) = So (ee + jh)"m* 


nO j=0 
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Table 6.9 Discretization of the exponential distribution by two methods 


J f; rounding f; matching. 
0 0.09516 0.09365 
1 0.16402 0.16429 
2 0.13429 0.13451 
3 0.10995 0.11013 
4 0.09002 0.09017 
5 0.07370 0.07382 
6 0.06034 0.06044 
7 0.04940 0.04948 
8 0.04045 0.04051 
9 0.03311 0.03317 
10 0.02711 0.02716 


6.6.6 Exercises 


6.36 Show that the method of local moment matching with k = 1 (matching 
total probability and the mean) using (6.21) and (6.22) results in 


peas lta h) 
A a a L T TA 


and that {f;; i = 0,1,2,...} forms a valid distribution with the same mean 
as the original severity distribution. Using the formula given here, verify the 
formula given in Example 6.21. 


6.37 You are the agent for a baseball player who desires an incentive contract 
that will pay the amounts given in Table 6.10. The number of times at bat 
has a Poisson distribution with k = 200. The parameter x is determined so 
that the probability of the player earning at least 4,000,000 is 0.95. Determine 
the player’s expected compensation. 


Table 6.10 Data for Exercise 6.37 


Probability of hit Compensation 
Type of hit per time at bat l per hit 
Single 0.14 zT 
Double 0.05 22 
Triple 0.02 3x 


Home run 0.03 4a 


THE RECURSIVE METHOD 171 


6.38 A weighted average of two Poisson distributions 


e7 E e™à2 AE 
i 


Pk =W 


has been used by some authors e.g., Tröbliger [130] to treat drivers as either 
“good” or “bad” (see Example 4.64). 


(a) Find the pgf Py (2) of the number of losses in terms of the two pgfs 
P,(z) and P2(z) of the number of losses of the two types of drivers. 


(b) Let fx (x) denote a severity distribution defined on the nonnegative 


integers. How can (6.16) be used to compute the distribution of 
ageregate claims for the entire group? 


(c) Can this be extended to other frequency distributions? 


6.39 A compound Poisson aggregate loss model has five expected claims per 
year. The severity distribution is defined on positive multiples of 1,000. Given 
that fs(1) = e` and fs(2) = 3e~*, determine fx (2). 


6.40 For a compound Poisson distribution, À = 6 and individual losses have 
pf fx(1) = fx(2) = fx(4) = 4. Some of the pf values for the aggregate 
distribution S are given in Table 6.11. Determine fs(6). 


6.41 Consider the (a,b,0) class of frequency distributions and any severity 
distribution defined on the positive integers {1,2,..., M < co}, where M is 
the maximum possible single loss. 


(a) Show that for the compound distribution the following backward 
recursion holds: 


fe M -— 
fsc +M) -D (e T eae 


fx(2)= ( 


) x ~ 4) fote+v) 


M 
a+ 1) fx(M) 


Table 6.11 Data for Exercise 6.40 


z fs(z) 
3 0.0132 
4 0.0215 
5 0.0271 
6 fs(6) 
7 0.0410 
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(b) For the binomial (m, q) frequency distribution, how can the above Table 6.12 Data for Exercise 6.48 


formula be used to obtain the distribution of aggregate losses? See Deductible Net premium 

Panjer and Wang [104]. : Sf Ae Seas ret ee 

4 0.20 

6.42 Aggregate claims are compound Poisson with A = 2, fx) = i, and 5 0.10 
fx (2) = 3. For a premium of 6 an insurer covers aggregate claims and agrees 6 0.04 
to pay a dividend (a refund of premium) equal to the excess, if any, of 75% of 7 0.02 


the premium over 100% of the claims. Determine the excess of premium over 
expected claims and dividends. 


6.43 On a given day, a physician provides medical care to N4 adults and No 4. A= 29. 


children. Assume N4 and No have Poisson distributions with parameters 
3 and 2, respectively. The distributions of length of care per patient are as 
follows: 


Determine the expected number of claims of size 2. 


6.47 For a compound Poisson distribution with positive integer claim amounts, 
the probability function follows: 


Adult Child í 
a J4 09 fs(z) = = [0.16 s(x — 1) + kfs(x — 2) + 0.72fs(x — 3)], fn ee eee 
2 hour 0.6 0.1 


The expected value of aggregate claims is 1.68. Determine the expected num- 
ber of claims. 


A Let Na, Nc, and the lengths of care for all individuals be independent. 
The physician charges 200 per hour of patient care. Determine the probability 
that the office income on a given day is less than or equal to 800. 


6.48 For a portfolio of policies you are given the following: 


1. The number of claims has a Poisson distribution. 


6.44 A group policyholder’s aggregate claims, S, has a compound Poisson 
distribution with \ = 1 and all claim amounts equal to 2. The insurer pays 
the group the following dividend: 


_f 6-S, S<6, 
Dey 4 S>6. 


2. Claim amounts can be 1, 2, or 3. 


3. A stop-loss reinsurance contract has net premiums for various deductibles 
as given in Table 6.12. 


` Determine the probability that aggregate claims will be either 5 or 6. 


6.49 For group disability income insurance, the expected number of disabil- 
ities per year is 1 per 100 lives covered. The continuance (survival) function 
for the length of a disability in days, Y, is 


Determine E[D]. 


6.45 You are given two independent compound Poisson random variables 51 
and S2, where f;(z), j = 1,2, are the two single-claim size distributions. You 
are given \y = àz = 1, fi(l) = 1, and fo(1) = fo(2) = 0.5. Let Fx(z) 
be the single-claim size distribution function associated with the compound 
distribution S = Sı + S2. Calculate F%4(6). 


Pr(Y >y)=1-4, y=0,1,...,10. 
10 
The benefit is 20 per day following a five-day waiting period. Using a com- 
pound Poisson distribution, determine the variance of aggregate claims for a 
6.46 The variable S has a compound Poisson claims distribution with the group of 1,500 independent lives. 
following: 
6.50 A population has two classes of drivers. The number of accidents per 
individual driver has a geometric distribution. For a driver selected at random 
from Class I, the geometric distribution parameter has a uniform distribution 
over the interval (0,1). Twenty-five percent of the drivers are in Class I. All 
drivers in Class II have expected number of claims 0.25. For a driver selected 


1. Individual claim amounts equal to 1, 2, or 3. 
2. E(S) = 56. 
3. Var(S) = 126. 
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at random from this population, determine the probability of exactly two 
accidents. 


Note: The following two Exercises require the use of a computer. 


6.51 A policy covers physical damage incurred by the trucks in a company’s 
fleet. The number of losses in a year has a Poisson distribution with \ = 5. 
The amount of a single loss has a gamma distribution with a = 0.5 and 
@ = 2,500. The insurance contract pays a maximum annual benefit of 20,000. 
Determine the probability that the maximum benefit will be paid. Use a span 
of 100 and the method of rounding. 


6.52 An individual has purchased health insurance for which he pays 10 for 
each physician visit and 5 for each prescription. The probability that a pay- 
ment will be 10 is 0.25, and the probability that it will be 5 is 0.75. The 
total number of payments per year has the Poisson—Poisson (Neyman Type 
A) distribution with 4; = 10 and Az = 4. Determine the probability that to- 
tal payments in one year will exceed 400. Compare your answer to a normal 
approximation. 


6.53 Demonstrate that if the exponential distribution is discretized by the 
method of rounding, the resulting discrete distribution is a ZM geometric 
~ distribution. 


6.7 THE IMPACT OF INDIVIDUAL POLICY MODIFICATIONS ON 
AGGREGATE PAYMENTS 


In Section 5.6 the manner in which individual deductibles (both ordinary and 
franchise) affect both the individual loss amounts and the claim frequency 
distribution was discussed. In this section we will consider the impact on 
aggregate losses. It is worth noting that both individual coinsurance and 
individual policy limits have an impact on the individual losses but not on 
the frequency of such losses, so we will focus primarily on the deductible 
issues in what follows. We also remark that we continue to assume that the 
presence of policy modifications does not have an underwriting impact on 
the individual loss distribution through an effect on the risk characteristics of 
the insured population, an issue which was discussed in Section 5.6. That is, 
the ground-up distribution of the individual loss amount X is assumed to be 
unaffected by the policy modifications, and only the payments themselves are 
affected. 

From the standpoint of the aggregate losses, the relevant facts are now 
described. Regardless of whether the deductible is of the ordinary or fran- 
chise type, we shall assume that an individual loss results in a payment with 
probability v. The individual ground-up loss random variable X has pol- 
icy modifications (including deductibles) applied, so that a payment is then 
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made. Individual payments may then be viewed on a per-loss basis, where 
the amount of such payment, denoted by Y”, will be 0 if the loss results in 
no payment. Thus, on a per-loss basis, the payment amount is determined on 
each and every loss. Alternatively, individual payments may also be viewed 
on a per-payment basis. In this case, the amount of payment is denoted by 
YP, and on this basis payment amounts are only determined on losses which 
actually result in a nonzero payment being made. Therefore, by definition, 
Pr(Y? = 0) = 0, and the distribution of YP is the conditional distribution of 
Y? given that Y? > 0. Notationally, we write YP = Y"|Y” > 0. Therefore, 
the cumulative distribution functions are related by 


Fy.(y) =(l—v)+vFye(y), y 29, 


because 1 — v = Pr(Y” = 0) = Fyz (0) (recall that Y”? has a discrete proba- 
bility mass point 1—v at 0, even if X and hence YP and Y” have continuous 
probability density functions for y > 0). The moment generating functions of 
Y? and Y? are thus related by 


My (t) = (1 fe v) + vMy Pp (t), (6.23) 
which may be restated in terms of expectations as 
B(eY”) = E(eY [Yt = 0) Pr (Y? =0) + E(e" |Y? > 0) Pr (Y* > 0). 


It follows from Section 5.6 that the number of losses N” and the number 
of payments N? are related through their probability generating functions by 


Pyr(z) = Pyz(l1—v + vz), (6.24) 


where Pyr (2) = E(2"") and Pye (2) = B(2"*). 

We now turn to the analysis of the aggregate payments. On a per loss 
basis, the total payments may be expressed as S = YË + YP +- + Yue 
with S = 0 if N? = 0 and where Y> is the payment amount on the jth loss. 
Alternatively, ignoring losses on which no payment is made, we may express 
the total payments on a per-payment basis as S = YP +YF te + Ye 
with S = 0 if NP =0, and YP is the payment amount on the jth loss, which 
results in a nonzero payment. Clearly, S may be represented in two distinct 
ways on an aggregate basis. Of course, the moment generating function of § 
on a per-loss basis is 


Ms(t) = E (5) = Py: [Myx (é)], (6.25) 
whereas on a per-payment basis we have 
Ms(t) = E (e'8) = Pyr [Myr (d)]. (6.26) 


Obviously, (6.25) and (6.26) are equal, as may be seen from (6.23) and (6.24). 
That is, 


Pyt [Myx (t)] = Pe [1 — v + vMyr (t)| = Pye [Mye (t)]. 
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Consequently, any analysis of the aggregate payments S may be done on 
either a per-loss basis [with compound representation (6.25) for the moment 
generating function] or on a per payment basis [with (6.26) as the compound 
moment generating function]. The basis selected should obviously be deter- 
mined by whatever is more suitable for the particular situation at hand. While 
by no means a hard-and-fast rule, the authors have found it more convenient 
to use the per-loss basis to evaluate moments of S. In particular, the formulas 
given in Section 5.5 for the individual mean and variance are on a per-loss ba- 
sis, and the mean and variance of the aggregate payments 5 may be computed. 
using these and (6.6) but with N replaced by N? and X by Y”. 

On the other hand, if the (approximated) distribution of S is of interest, 
then a payment basis is normally to be preferred. The reason for this choice is 
that on a per-loss basis underflow problems may result if E(N*) is large, and 
computer storage problems may occur due to the presence of a large number 
of zero probabilities in the distribution of Y+, particularly if a franchise de- 
ductible is employed. Also, for convenience, we normally elect to apply policy 
modifications to the individual loss distribution first and then discretize (if 
necessary), rather than discretizing and then applying policy modifications to 
the discretized distributions. This issue is only relevant if the deductible and 
policy limit are not integer multiples of the discretization span, however. The 
following example illustrates these ideas. 


Example 6.22 The number of ground-up losses is Poisson distributed with 
mean \ = 3. The individual loss distribution is Pareto with parameters a = 4 
and 0 = 10. An individual ordinary deductible of 6, coinsurance of 75%, 
and an individual loss limit of 24 (before application of the deductible and 
coinsurance) are all applied. Determine the mean, variance, and distribution 
of aggregate payments. 


We will first compute the mean and variance on a per-loss basis. The mean ~ 


number of losses is E(N”) = 3, and the mean individual payment on a per 
loss basis is (using Theorem 5.13 with r = 0 and the Pareto distribution) 


E(Y“) = 0.75 [E(X A 24) — E(X A 6)] = 0.75(3.2485 — 2.5195) = 0.54675. 
The mean of the aggregate payments is thus 
E(S) = E(N*)E(Y~) = (8)(0.54675) = 1.64. 


The second moment of the individual payments on a per-loss basis is, using 
Theorem 5.14 with r = 0 and the Pareto distribution, 


E((v4)?] = (0.75)?{E [(X A 24)?] —E[(X A6)’] 

l —2(6)E(X A 24) + 2(6)E(X A 6)} 

(0.75)? [26.3790 — 10.5469 — 12(3.2485) + 12(2.5195)] 
3.98481. 
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In order to compute the variance of aggregate payments, we do not need 
to explicitly determine Var(Y”) because S is compound Poisson distributed, 
which implies [using (4.27], for example) that 


Var(S) = AE [(Y¥")?] = 3(3.98481) = 11.9544 = (3.46)”. 


In- order to compute the (approximate) distribution of S, we will use the per 
payment basis. First note that v = Pr(X > 6) = [10/(10 + 6)]|* = 0.15259, 
and the number of payments NP is Poisson distributed with mean E(N?) = 
dv = 3(0.15259) = 0.45776. Let Z = X — 6|X > 6, so that Z is the individual 
payment random variable with only a deductible of 6. Then 


Pr(X > z+ 6) 
P = —— 
Ae A) eG) 
With coinsurance of 75%, Y? = 0.75Z has cumulative distribution function 


Pr(X > 6+ y/0.75) 
Pr(X > 6) 


Fyp(y) =1—Pr(0.75Z > y) =1 


That is, for y less than the maximum payment of (0.75)(24 — 6) = 13.5, 


Pr(X > 6) — Pr(X > 6 + y/0.75) 


Fyr(y) = Pr(X > 6) 


y < 13.5, 

and Fyr(y) = 1 for y > 13.5. We then discretize the distribution of yF 
(we thus apply the policy modifications first and then discretize) using a 
span of 2.25 and the method of rounding. This yields fo = Fyp (1.125) = 
0.30124, fı = Fyr (3.375) — Fyr (1.125) = 0.32768, and so on. In this situation 
care must be exercised in evaluation of fe, and we have fg = Fyr (14.625) — 
Fyr (12.375) = 1 — 0.94126 = 0.05874. Then fn =1—1=0 forn=7,8,.... 
The approximate distribution of S may then be computed using the compound 
Poisson recursive formula, namely, go = e~9-45776(1~-0-30124) — 9.72625, and 


k 
0.45776 ‘ 
gk = i Difsti-s k =1,2,3,... . 


Thus, gı = (0.45776)(1)(0.32768) (0.72625) = 0.10894, for example. O 


6.7.1 Exercises 


6.54 Suppose that the number of ground-up losses NV L has probability gen- 
erating function Py:(z) = B[6(z—1)], where @ is a parameter and B is 
functionally independent of 6. The individual ground-up loss distribution is 
exponential with cumulative distribution function Fx (x) =1—e"#*, x20. 
Individual losses are subject to an ordinary deductible of d and coinsurance of 
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a. Demonstrate that the aggregate payments, on a per-payment basis, have 
compound moment generating function given by (6.26), where N P has the 
same distribution as N? but with @ replaced by 0e~“4 and Y? has the same 
distribution as X but with p replaced by p/a. 


6.55 A ground-up model of individual losses has the gamma distribution 
with parameters a = 2 and @ = 100. The number of losses has the negative 
binomial distribution with r = 2 and @ = 1.5. An ordinary deductible of 50 
and a loss limit of 175 (before imposition of the deductible) are applied to 
each individual loss. 


(a) Determine the mean and variance of the aggregate payments on a 
per-loss basis. 


(b) Determine the distribution of the number of payments. 


(c) Determine the cumulative distribution function of the amount Y? 
of a payment given that a payment is made. 


(d) Discretize the severity distribution from (c) using the method of 
rounding and a span of 40. 


(e) Use the recursive formula to calculate the discretized distribution 
of aggregate payments up to a discretized amount paid of 120. 


X 6.8 CALCULATIONS WITH APPROXIMATE DISTRIBUTIONS 


Whenever the severity distribution is calculated using an approximate method, 
the result is, of course, an approximation to the true aggregate distribution. 
In particular, the true aggregate distribution is often continuous (except, per- 
haps, with discrete probability at zero or at an aggregate censoring limit) while 
the approximate distribution either is discrete with probability at equally 
spaced values as with recursion and Fast Fourier Transform (FFT), is discrete 
with probability 1/n at arbitrary values as with simulation, or has a piecewise 
linear distribution function as with Heckman—Meyers. In this section we in- 
troduce reasonable ways to obtain values of F(x) and E[(SA2)*] from those 
approximating distributions. In all cases we assume that the true distribution 
of aggregate payments is continuous, except perhaps with discrete probability 
at S = 0. 


6.8.1 Arithmetic distributions 


For recursion and the FFT, the approximating distribution can be written 
as po,Pi;---, where pj = Pr(S* = jh) and S* refers to the approximating 
distribution. While several methods of undiscretizing this distribution are 
possible, we will introduce only one. It assumes we can obtain go = Pr(S = 0); 
the true probability that aggregate payments are zero. The method is based 
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Table 6.13 Discrete approximation to the aggregate payments distribution 


j T fx(2) pj = fs- (2) 
0 0 0.009934 0.335556 
1 2 0.019605 0.004415 
2 4. 0.019216 0.004386 
3 6 0.018836 0.004356 
4 8 0.018463 0.004327 
5 10 0.018097 0.004299 
6 12 0.017739 0.004270 
7 14 0.017388 0.004242 
8 16 0.017043 0.004214 
9 18 0.016706 0.004186 
10 20 0.016375 0.004158 


on constructing a continuous approximation to S* by assuming the probability 
p; is uniformly spread over the interval (j -— 4)h to (j + 4)h for j =1,2,.... 
For the interval from 0 to h/2, a discrete probability of go is placed at zero 
and the remaining probability, po — go, is spread uniformly over the interval. 
Let S** be the random variable with this mixed distribution. All quantities 
of interest are then computed using S**. 


Example 6.23 Let N have the geometric distribution with B = 2 and let X 
have the exponential distribution with 0 = 100. Use recursion with a span 
of 2 to approximate the aggregate distribution and then obtain a continuous 
approzimation. 


The exponential distribution was discretized using the method which pre- 
serves the first moment. The probabilities appear in Table 6.13. Also pre- 
sented are the aggregate probabilities computed using the recursive formula. 
We also note that go = Pr(N = 0) = (1 + 8)"* = 3. For j =1,2,... the con- 
tinuous approximation has pdf fs-- (x) = fs-(2j)/2, 27 -1<x<2j+1. We 
also have Pr(S** = 0) = 3 and fs-~(x) = (0.335556 — 4)/1 = 0.002223, 0 < 
<I. o 


Returning to the original problem, it is possible to work out the general 
formulas for the basic quantities. For the cdf, 


* Do — go 
Fs-(x) = o+ f ds 
s--(z) ae ae 
22 h 
= go + =y (Po — 90); OSIS 
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and 
j-i g a 
Fs--(xz) = Pi +f = ds 
e) 2 G-1/2h 
j-1 : 
—(7-1/2)h 1 1 
= p+ FU (1-5) a<es(s+3)h, 
i=0 2 2 


For the limited expected value (LEV), 


E[(S* Az)*] = 0% go + f eea ds + z"[1 — Fs--(z)| 
2ghtl (po — go) k h 
= TRAY — Fs-- <— 
keen eh Nea: 
and 
i-l p(é4+1/2)h 
ve i f s Po — Jo kPi 
E(S* Ax)"] = g+ f POTD gga | ils 
| g Hg 2 G-a h 
7 I shthds + afl — Fs--(x)] 
G-1/2h h 
= k + 1 Zi k + 1 Pi 
k+1 E pons 1 2 h k+1 
P ales eL BT 
h(k +1) 


A 1 1 
+2*[1 — Fg-- (£), (1-3) h<a< (s+5) h. 
For k = 1 this reduces to 


x h 
z(1 — go) — z (Po — 90), 0<a< =) 


h ah x? — [(j — 1/2)h]2 
BG As)=+ “Gat Sip ee 
4 A 2h 
1 1 
+a[1 — Fs--(2)), (3 = 5) h<r< (3 + 5 h. 
(6.27) 
These formulas are summarized in Appendix E. 


Example 6.24 (Example 6.23 continued) Compute the cdf and LEV at inte- 
gral values from 1 to 10 using S*, S**, and the exact distribution of aggregate 
losses. 


The exact distribution is available for this example. It was developed in 
Example 6.12 where it was determined that Pr(S = 0) = (1+ )~1 = 4 and 
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Table 6.14 Comparison of true aggregate payment values and two approximations 


cdf LEV 
1 0.335552 0.335556 0.335556 0.66556 0.66444 0.66556 
2 0.337763 0.339971 0.337763 1.32890 1.32889 1.32890 
3 0.339967 0.339971 0.339970 1.99003 1.98892 1.99003 
4 0.342163 0.344357 0.342163 2.64897 2.64895 2.64896 
5 0.344352 0.344357 0.344356 3.30571 3.30459 3.30570 
6 0.346534 0.348713 0.346534 3.96027 3.96023 3.96025 
7 0.348709 0.348713 0.348712 4.61264 4.61152 4.61263 
8 0.350876 0.353040 0.350876 5.26285 5.26281 5.26284 
9 0.353036 0.353040 0.353039 5.91089 5.90977 5.91088 
10 0.355189 0.357339 0.355189 6.55678 6.55673 6.55676 


the pdf for the continuous part is 


NE an oe ERE E 
fsa) = ae ae | TD | = aan See 


From this we have 
TAD 
Naa 45/300 Ja — 20-2 
F(a) =4+ | 900° si ds =1—2e /300 
and 
T 2 ‘ 
E(S Az) = i a @ 9/300 ds + age #/300 = 200(1 — g72/300); 


-The requested values are given in Table 6.14. o 


6.8.2 Empirical distributions 


When the approximate distribution is obtained by simulation (the simulation 
process is discussed in Chapter 17), the result is an empirical distribution. Un- 
like approximations produced by recursion or the FFT, simulation does not 
place the probabilities at equally spaced values. This makes it less clear how 
the approximate distribution should be smoothed. On the other hand, simu- 
lation usually involves tens or hundreds of thousands of points, and therefore 
the individual points are likely to be close to each other. For these reasons it 
seems sufficient to simply use the empirical distribution as the answer. That 
is, all calculations should be done using the approximate empirical random 
variable, S*. The formulas for the commonly required quantities are very 
simple. Let 21, £2,.-., Zn be the simulated values. Then 


_ number of zj < £ 


Fs-(x) E 
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Table 6.15 Simulated values of aggregate losses 


ce ees tN 


j Tj j Tj 
1-331 0 346 6.15 
332 0.04 © 347 6.26 
333 0.12 348 6.58 
334 0.89 349 6.68 
335 1.76 350 6.71 
336 2.16 351 6.82 
337 3.13 352 7.76 
338 3.40 353 8.23 
339 4.38 354 8.67 
340 4.78 355 8.77 
341 4.95 356 8.85 
342 5.04 357 9.18 
343 5.07 358 9.88 
344 5.81 359 10.12 
345 5.94 


and 1 
B((S* Az)*] => So af + a*[l — Fs- (€). 


u Tj LTE 
Example 6.25 (Example 6.23 continued) Simulate 1,000 observations from 
the compound model with geometric frequency and exponential severity. Use 
the results to obtain values of the cdf and LEV for the integers from 1 to 10. 
The small sample size was selected so that only about30 values between zero 
and 10 (not including zero) are expected. 


The simulations produced an aggregate payment of zero 331 times. The set 
of nonzero values that were less than 10 plus the first value past 10 are pre- 
sented in Table 6.15. Other than zero, none of the values appeared more than 
once in the simulation. The requested values from the empirical distribution. 
along with the true values are given in Table 6.16. O 


6.8.3 Piecewise linear cdf 


When using the Heckman—Meyers inversion method (to be introduced in Sec- 
tion 6.9.2), the output is approximate values of the cdf F(x) at any set 
of desired values. The values are approximate because the severity distri- 
bution function is required to be piecewise linear and because approximate 
integration is used. Let S# denote an arbitrary random variable with cdf 
values as given by the Heckman—Meyers method at arbitrarily selected points 
0 = 21 < To <: < Tn and let Fj = Fge(x;). Also, set F, = 1 so that 
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Table 6.16 Empirical and smoothed values from a simulation 


T Fs-(x) F(x) E(S* Ax) E(S Az) 
0 0.331 0.333 0.0000 0.0000 
1 0.334 0.336 0.6671 0.6656 
2. 0.335 0.338 1.3328 1.3289 
3 0.336 0.340 1.9970 1.9900 
4 0.338 0.342 2.6595 2.6490 
5 0.341 0.344 3.3206 3.3057 
6 0.345 0.347 3.9775 3.9603 
7 0.351 0.349 4.6297 4.6126 
8 0.352 0.351 5.2784 5.2629 
9 0.356 0.353 5.9250 5.9109 
10 0.358 0.355 6.5680 6.5568 


no probability is lost. The easiest way to complete the description of the 
smoothed distribution is to connect these points with straight lines. Let Ser 
be the random variable with this particular cdf. Intermediate values of the 
cdf of S¥* are found by interpolation. 


_ (2 25-1) F5 + (2; — 2) Fj-1 
Tj — Lj-1 


The formula for the limited expected value is (for Tj- < oS Bz) 


BU oe 
Fi „F; — Fy 
G iF yy 


E[(S** A x)*] ee PEA 


ll 


i=2 Y Ti—1 


z FF- : 
+f ght? ds + rt [1 — Fos (x)| 
zj-ı Tj Tj-1 


_ Bo -HE - Fi) 
7 2 (k + 1)(z: — 2-1) 
(okt) — ght) (Fj — Fj-1) 
(k +1)(xj — %j-1) 
ah f _ (æ -zj-1)F; + (2j — a| 
Tj aa Tj—1 
and when k = 1, 


a ER 
rs (zi + zi-1)(F; — Fi-1) (2? — x31) (F; Ft) 
E(S?* Ag) = Ne 
( ) 2, 2 2(x; = 254) 
ne h _ (e& = 25-1) F5 + i - | l 
vi Tj—1 
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6.8.4 Exercises 


6.56 Let the frequency (of losses) distribution be negative binomial with 
r = 2 and 8 = 2. Let the severity distribution (of losses) have the gamma 
distribution with a = 4 and 0 = 25. Determine F'5(200) and E(S A 200) for 
an ordinary per-loss deductible of 25. Use the recursive formula to obtain the 
aggregate distribution and use a discretization interval of 5 with the method 
of rounding to discretize the severity distribution. 


6.57 (Exercise 6.51 continued) Recall that the number of claims has a Pois- 
son distribution with A = 5 and the amount of a single claim has a gamma 
distribution with a = 0.5 and 0 = 2,500. Determine the mean, standard devi- 
ation, and 90th percentile of payments by the insurance company under each 
of the following coverages. Any computational method may be used. 


(a) A maximum aggregate payment of 20,000. 

(b) A per-claim ordinary deductible of 100 and a per claim maximum 
payment of 10,000. There is no aggregate maximum payment. 

(c) A per-claim ordinary deductible of 100 with no maximum payment. 
There is an aggregate ordinary deductible of 15,000, an aggregate 
coinsurance factor of 0.8, and a maximum insurance payment of 
20,000. This corresponds to an aggregate reinsurance provision. 


6.58 (Exercise 6.52 continued) Recall that the number of payments has the 
Poisson—Poisson distribution with A; = 10 and Az = 4 while the payment per 
claim by the insured is 5 with probability 0.75 and 10 with probability 0.25. 
Determine the expected payment by the insured under each of the following 
situations. Any computational method may be used. 


(a) A maximum payment of 400. 


(b) A coinsurance arrangement where the insured pays 100% up to an 


aggregate total of 300 and then pays 20% of aggregate payments 
above 300. 


¥% 6.9 INVERSION METHODS 


Inversion methods discussed in this section are used to obtain numerically 
the probability function, or some related function such as a net stop-loss 
premium (aggregate excess-of-loss pure premium), from a known expression 
for a transform, such as the pgf, mgf, or cf of the desired function. 

Compound distributions lend themselves naturally to this approach be- 
cause their transforms are compound functions and are easily evaluated when 
both frequency and severity components are known. The pef and cf of the 
aggregate loss distribution are 


Ps(z) = Py[Px(z)] 
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and 
ps(2) = Efe] = Pr[px(z)], (6.28) 


respectively. The characteristic function always exists and is unique. Con- 
versely, for a given characteristic function, there always exists a unique dis- 
tribution. The objective of inversion methods is to obtain the distribution 
numerically from the characteristic function (6.28). 

It is worth mentioning that there has recently been much research in other 
areas of applied probability on obtaining the distribution numerically from 
the associated Laplace-Stieltjes transform. These techniques are applicable 
to the evaluation of compound distributions in the present context but will 
not be discussed further here. A good survey is [2], pp. 257-323. 


6.9.1 Fast Fourier transform 


The FFT is an algorithm that can be used for inverting characteristic functions 
to obtain densities of discrete random variables. The FFT comes from the 
field of signal processing. It was first used for the inversion of characteristic 
functions of compound distributions by Bertram [14] and is explained in detail 
with applications to aggregate loss calculation by Robertson [111]. 


Definition 6.26 For any continuous function f(x), the Fourier transform 
is the mapping 
a ; 
f(z) = / f(x)e*** dz. (6.29) 
—coO 


The original function can be recovered from its Fourier transform as 
1 f° = i 
fle) =5- | Foa 
2T Jo 


When f(z) is a probability density function, f(z) is its characteristic func- 
tion. For our applications, f(z) will be real valued. From (6.29), f(z) is 
complex valued. When f(z) is a probability function of a discrete (or mixed) 
distribution, the definitions can be easily generalized (see, for example, Fisz 


[38)). 


Definition 6.27 Let f, denote a function defined for all integer values of x 
that is periodic with period length n (that is, froin = fa for all x). For the 


vector (fo, fi,---;fn—1), the discrete Fourier transform is the mapping 
fe, T=...,—1,0,1,..., defined by 

wee 2i 

a= frow( Pik), k=...,-1,0,1,.... (6.30) 
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This mapping is bijective. In addition, fr: is also periodic with period length 
n. The inverse mapping is 


n—1 
1 = 2ri 
ERA N ——— 49 — — ERES wl 
h= 5D deo Tiki), j ,—1,0,1, (6.31) 


This inverse mapping recovers the values of the original function. 


Because of the periodic nature of f and f, we can think of the discrete 
Fourier transform as a bijective mapping of n points into n points. From 
(6.30), it is clear that, in order to obtain n values of fp, the number of terms 
that need to be evaluated is of order n?, that is, O(n”). 

The Fast Fourier Transform (FFT) is an algorithm that reduces the 
number of computations required to be of order O(nlna n). This can be a 
dramatic reduction in computations when n is large. The algorithm exploits 
the property that a discrete Fourier transform of length n can be rewritten 
as the sum of two discrete transforms, each of length n/2, the first consisting 
of the even-numbered points and the second consisting of the odd-numbered 
points. 


x 207 . 
fe = Sofie (=i) 
n/2-1 


2min. 2min. 
= 5 fa; exp (“25h + 5 f2j+1 exo [225+ Dk 


j=0 j=0 
m ri. oni \ RO Oni. 
= > fay exp| — jk | +exp| —k X foj41exp| — Jk |, 
e m n F0 m 


when m = n/2. Hence 


f= fe + exp (=r) fe: (6.32) 


These can, in turn, be written as the sum of two transforms of length m/2. 
This can be continued successively. For the lengths n/2, m/2, ... to be 
integers, the FFT algorithm begins with a vector of length n = 27. The 
successive writing of the transforms into transforms of half the length will 
result, after r times, in transforms of length 1. Knowing the transform of 
length 1 will allow one to successively compose the transforms of length 2, 
2?,23,...,2" by simple addition using (6.32). Details of the methodology are 
found in Press et al. [107]. 

In our applications, we use the FFT to invert the characteristic function 
when discretization of the severity distribution is done. This is carried out as 
follows: 
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1. Discretize the severity distribution using some methods such as those 
described in the previous section, obtaining the discretized severity dis- 
tribution 


fx(0), fx(4),- . eet E 1), 


where n = 2" for some integer r and n is the number of points desired 
in the distribution fs(z) of aggregate claims. 


2. Apply the FFT to this vector of values, obtaining yx(z), the charac- 
teristic function of the discretized distribution. The result is also a 
vector of n = 2” values. 


3. Transform this vector using the pgf transformation of the claim fre- 
quency distribution, obtaining yg(z) = Py|yx(z)], which is the charac- 
teristic function, that is, the discrete Fourier transform of the aggregate 
claims distribution, a vector of n = 2” values. l 


4. Apply the Inverse Fast Fourier Transform (IFFT), which is identical to 
the FFT except for a sign change and a division by n [see (6.31)]. This 
gives a vector of length n = 2” values representing the exact distribution 
of aggregate claims for the discretized severity model. 


The FFT procedure requires a discretization of the severity distribution. 
When the number of points in the severity distribution is less than n = 2", 
the severity distribution vector must be padded with zeros until it is of length 
n. 

When the severity distribution places probability on values beyond z = n, 
as is the case with most distributions discussed in Chapter 4, the probability 
that is missed in the right-hand tail beyond n can introduce some minor error 
in the final solution because the function and its transform are both assumed 
to be periodic with period n, when in reality they are not. The authors suggest 
putting all the remaining probability at the final point at x = n so that the 
probabilities add up to 1 exactly. This allows for periodicity to be used for 
the severity distribution in the FFT algorithm and ensures that the final set 
of aggregate probabilities will sum to 1. However, it is imperative that n be 
selected to be large enough so that most all the aggregate probability occurs 
by the nth point. The following example provides an extreme illustration. 


Example 6.28 Suppose the random variable X takes on the values 1, 2, and 
3 with probabilities 0.5, 0.4, and 0.1, respectively. Further suppose the number 
of claims has the Poisson distribution with parameter \ = 3. Use the FFT to 
obtain the distribution of S using n = 8 and n = 4,096. 


In either case, the probability distribution of X is completed by adding 
one zero at the beginning (because S places probability at zero, the initial 
representation of X must also have the probability at zero given) and either 
4 or 4,092 zeros at the end. The results from employing the FFT and IFFT 
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Table 6.17 Aggregate probabilities computed by the FFT and IFFT 


aaa 


n=8 n = 4,096 

fs(s) fs(s) 
0 0.11227 0.04979 
1 0.11821 0.07468 
2 0.14470 0.11575 
3 0.15100 0.13256 
4 0.14727 0.13597 
5 0.13194 0.12525 
6 0.10941 0.10558 
7 0.08518 0.08305 


appear in Table 6.17. For the case n = 8, the eight probabilities sum to 1. For 
the case n = 4,096, the probabilities also sum to 1, but there is not room here 
to show them all. It is easy to apply the recursive formula to this problem, 
which verifies that all of the entries for n = 4,096 are accurate to the five 
decimal places presented. On the other hand, with n = 8, the FFT gives 
-values that are clearly distorted. If any generalization can be made, it is that 
more of the extra probability has been added to the smaller values of S. O 


Because the FFT and IFFT algorithms are available in many computer 
software packages and because the computer code is short, easy to write, 
and available (e.g., [107], pp. 411-412), no further technical details about the 
algorithm are given here. The reader can read any one of numerous books 
dealing with FFTs for a more detailed understanding of the algorithm. The 
technical details which allow the speeding up of the calculations from O(n?) to 
O(nInz n) relate to the detailed properties of the discrete Fourier transform. 
Robertson [111] gives a good explanation of the FFT as applied to calculating 
the distribution of aggregate claims. 


6.9.2 Direct numerical inversion 


The inversion of the characteristic function (6.28) has been done using ap- 
proximate integration methods by Heckman and Meyers [51] in the case of 
Poisson, binomial, and negative binomial claim frequencies and continuous 
severity distributions. The method is easily extended to other frequency dis- 
tributions. 

In this method, the severity distribution function is replaced by a piecewise 
linear distribution. It further uses a maximum single-loss amount so the cdf 
jumps to 1 at the maximum possible individual loss. The range of the severity 
random variable is divided into intervals of possibly unequal length. The 
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remaining steps parallel those of the FFT method. Consider the cdf of the 
severity distribution F'y(x), 0 < £x < oo. Let 0 = a < T1 < + < Tn be 
arbitrarily selected loss values. Then the probability that losses lie in the 
interval (zk—1, £4] is given by fe = Fx(ex) — Fx(re-1). Using a uniform 
density dą over this interval results in the approximating density function 
f*(z) = dk = fk/(Ek — £k—1) for Tk—1 < T < Tp. Any remaining probability 
fn41 = 1 — Fx(zn) is placed as a spike at zn. This approximating pdf 
is selected to make evaluation of the cf easy. It is not required for direct 
inversion. The cf of the approximating severity distribution is 


ll 


f e7" dFx(x) 


0 


n Tk ; 
>] dpe”? dz + fanye 
z 


k=1 Tk- 


px(z) 


İZTk L2H fy 
e | a e7ZTk 1 


n 
> dy + fn+i eta, 
k=1 


iz 
The cf can be separated into real and imaginary parts by using Euler’s formula 
e? = cos(8) + isin(0). 


Then the real part of the cf is 


n 


a(z) =Relpx(2)} = $Ð delsin(zm) ~ sin(zzr-1)] 


k=1 
+fn+1 COS(Zln) 


and the imaginary part is 


b(z) =Im[yx(z)] = Z 5y dy[cos(zz—1) — cos(zx4,)| 
k=1 


+fn+1 Sin(ZEn). 


The cf of aggregate losses (6.28) is obtained as 


ps(z) = Pylpx(z)] = Pu[a(z) + ib(2)], 
which can be rewritten as 


ps(z) = r(z)e*™ 


because it is complex valued. 
The distribution of aggregate claims is obtained as 


F(z) = st = A a sin( = — =) dz, (6.33) 


T 


X 
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where ø is the standard deviation of the distribution of aggregate losses. Ap- 
proximate integration techniques are used to evaluate (6.33) for any value 
of z. The reader is referred to Heckman and Meyers [51] for details. They 
also obtain the net stop-loss (excess pure) premium for the aggregate loss 
distribution as 


Il 


EKS- da= f ” cs — d) dFs(s) 


£02 =()-~(2-9) 


+b £ (6.34) 


P(d) 


from (6.33), where p is the mean of the aggregate loss distribution and d is 
the deductible. os 

Equation (6.33) provides only a single value of the distribution, while (6.34) 
provides only one value of the premium, but it does so quickly. The error of 
approximation depends on the spacing of the numerical integration method 
but is controllable. 


6.9.3 Exercise 


6.59 Repeat Exercises 6.51 and 6.52 using the inversion method. 


6.10 COMPARISON OF METHODS 


The recursive method has some significant advantages. The time required to 
compute an entire distribution of n points is reduced to O(n?) from O(n*) for 
the direct convolution method. Furthermore, it provides exact values when 
the severity distribution is itself discrete (arithmetic). The only source of 
error is in the discretization of the severity distribution. Except for binomial 
models, the calculations are guaranteed to be numerically stable. This method 
is very easy to program in a few lines of computer code. However, it has a few 
disadvantages. The recursive method only works for the classes of frequency 
distributions described in Section 4.6. Using distributions not based on the 
(a, b,0) and (a,b, 1) classes requires modification of the formula or developing 
a new recursion. Numerous other recursions have recently been developed in 
the actuarial and statistical literature. 
The FFT method is easy to use in that it uses standard routines available 
with many software packages. It is faster than the recursive method when n is 
large because it requires calculations of order n Ing n rather than n°. However, 
if the severity distribution has a fixed (and not too large) number of points, the 
recursive method will require fewer computations because the sum in (6.14) 
will have at most m terms, reducing the order of required computations to 
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be of order n, rather than n? in the case of no upper limit of the severity. 
The FFT method can be extended to the case where the severity distribution 
can take on negative values. Like the recursive method, it produces the entire 
distribution. 

The direct inversion method has been demonstrated to be very fast in 
calculating a single value of the aggregate distribution or the net stop-loss 
(excess pure) premium for a single deductible d. However, it requires a major 
computer programming effort. It has been developed by Heckman and Meyers 
[51] specifically for (a,b,0) frequency models. It is possible to generalize the 
computer code to handle any distribution with a pgf that is a relatively simple 
function. This method is much faster than the recursive method when the 
expected number of claims is large. The speed does not depend on the size of 
À in the case of the Poisson frequency model. In addition to being complicated 
to program, the method involves approximate integration whose errors depend 
on the method and interval size. 

Through the use of transforms, both the FFT and inversion methods are 
able to handle convolutions efficiently. For example, suppose a reinsurance 
agreement was to cover the aggregate losses of three groups, each with unique 
frequency and severity distributions. If 5;, i = 1,2,3, are the aggregate 
losses for each group, the characteristic function for the total aggregate losses 
S = S1 +S24+S3 is yg(z) = Ps, (z)¥5, (2) ¥5,(z) and so the only extra work is 
some multiplications prior to the inversion step. The recursive method does 
not accommodate convolutions as easily. 

The Heckman—Meyers method has some technical difficulties when being 
applied to severity distributions that are of the discrete type or have some 
anomalies, such as heaping of losses at some round number (e.g., 1,000,000). 
At any jump in the severity distribution function, a very short interval con- 
taining the jump needs to be defined in setting up the points (x1, Z2,... Zn). 
- We save a discussion of simulation for last because it differs greatly from 
the other methods. For those not familiar with this method, an introduction is 
provided in Chapter 17. The major advantage is a big one. If you can carefully 
articulate the model, you should be able to obtain the aggregate distribution 
by simulation. The programming effort may take a little time but can be done 
in a straightforward manner. Today’s computers will conduct the simulation 
in a reasonable amount of time. Most of the analytic methods were developed 
as a response to the excessive computing time that simulations used to require. 
That is less of a problem now. On the other hand, it is difficult to write a 
general-purpose simulation program. Instead, it is possibly necessary to write 
a new routine as each problem occurs. Thus it is probably best to save the 
simulation approach for those problems that cannot be solved by the other 
methods. Then, of course, it is worth the effort because there is no alternative. 

One other drawback of simulation occurs in extremely low frequency situ- 
ations (which is where recursion excels). For example, consider an individual 
excess-of-loss reinsurance in which reinsurance benefits are paid on individual 
losses above 1,000,000, an event which occurs about 1 time in 100, but when it 
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does, the tail is extremely long (for example, a Pareto distribution with small 
a). The simulation will have to discard 99% of the generated losses and then 
will need a large number of those that exceed the deductible (due to the large 
variation in losses). It may take a long time to obtain a reliable answer. One 
possible solution for simulation is to work with the conditional distribution of 
the loss variable, given that a payment has been made. 

No method is clearly superior for all problems. Each method has both 
advantages and disadvantages when compared with the others. What we 
really have is an embarrassment of riches. Twenty-five years ago, actuaries 
wondered if there would ever be effective methods for determining aggregate 
distributions. Today we can choose from several. 


(6.11 THE INDIVIDUAL RISK MODEL 


6.11.1 Parametric approximation 


The individual risk model represents the aggregate loss as a fixed sum of 
independent (but not necessarily identically distributed) random variables: 


S = X+ Xot + Xn 


This is usually thought of as the sum of the losses from n insurance con- 
tracts, for example, n persons covered under a group insurance policy. , 

The individual risk model was originally developed for life insurance in 
which the probability of death within a year is q; and the fixed benefit paid 
for the death of the jth person is b;. In this case, the distribution of the loss 
to the insurer for the jth policy is 


1-—q,, «=0, 
qj; z = bj. 


fx; (2) -f 


In this case the mean and variance of aggregate losses are 
m 
E(S ) = 5 bj qj 
j=l 


and T 
Var(S) = $ 0g (1 — 4;) 
j=1 


because the Xjs are assumed to be independent. Then, the pgf of aggregate 
losses is 


Ps(z) = Ila — qj +952"). (6.35) 


j=1 
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In the special case where all the risks are identical with g; = q and b; = 1, 
the pgf reduces to 


Ps(z) = (1+ ¢(z-1)]", 


and in this case S has a binomial distribution. 

The individual risk model can be generalized as follows. Let X; = I;B,;, 
where J, ..., In, Bi,...,Bn are independent. The random variable I; is an 
indicator variable that takes on the value 1 with probability q; and the value 
0 with probability 1—q,;. This variable indicates whether or not the jth policy 
produced a payment. The random variable B; can have any distribution and 
represents the amount of the payment in respect of the jth policy given that 
a payment was made. In the life insurance case, B; is degenerate, with all 
probability on the value b;. If we let u; = E(B;) and o? = Var(B;) then 


B(S) =D) ay; (6.36) 
and p 
Var(S) = X lgo} + (1 — g;)u3]- (6.37) 
j=1 


You are asked to verify these formulas in Exercise 6.60. The following example 
is a simple version of this situation. 


Example 6.29 Consider a group life insurance contract with an accidental 
death benefit. Assume that for all members the probability of death in the 
next year is 0.01 and that 30% of deaths are accidental. For 50 employees 
the benefit for an ordinary death is 50,000 and for an accidental death it is 
100,000. For the remaining 25 employees the benefits are 75,000 and 150,000, 


respectively. Develop an individual risk model and determine its mean and 
variance. 


For all 75 employees q; = 0.01. For 50 employees, B; takes on the value 
50,000 with probability 0.7 and 100,000 with probability 0.3. For them, p; = 
65,000 and of = 525,000,000. For the remaining 25 employees B; takes on 
the value 75,000 with probability 0.7 and 150,000 with probability 0.3. For 
them, 1; = 97,500 and of = 1,181,250,000. Then 


E(S) 


Il 


50(0.01) (65,000) + 25(0.01) (97,500) 
56,875 


and 


< 

E 

$ 
Í 


50(0.01) (525,000,000) + 50(0.01)(0.99) (65,000)? 
+-25(0.01)(1,181,250,000) + 25(0.01)(0.99)(97,500)? 
= 5,001,984,375. 


194 AGGREGATE LOSS MODELS 


Table 6.18 Employee data for Example 6.30 


i 
Employee, Age Benefit, ` Mortality rate, 
j (years) Sex bj qj 
1 20 M 15,000 0.00149 
2 23 M 16,000 0.00142 
3 27 M 20,000 0.00128 
4 30 M 28,000 0.00122 
5 31 M 31,000 0.00123 
6 46 M 18,000 0.00353 
7 AT M 26,000 0.00394 
8 49 M 24,000 0.00484 
9 64 M 60,000 0.02182 
10 17 F 14,000 0.00050 
11 22 F 17,000 0.00050 
12 26 F 19,000 0.00054 
13 37 F 30,000 0.00103 
14 55 F 55,000 0.00479 
Total i 373,000 

Oo 


When the risks are different, the probabilities defined by pgf (6.35) can 
be computed exactly or approximately. A normal, gamma, lognormal, or any 
other distribution can be used to approximate the distribution. This is usually 
done by matching the first few moments. Because the normal, gamma, and 
lognormal distributions each have two parameters, the mean and variance are 
sufficient. 


Example 6.30 (Group life insurance) A small manufacturing business has 
a group life insurance ‘contract on its 14 permanent employees. The actuary 
for the insurer has selected a mortality table to represent the mortality of the 
group. Each employee is insured for the amount of his or her salary rounded 
up to the neat 1,000 dollars. The group’s data are given in Table 6.18. 

If the insurer adds a 45% relative loading to the net (pure) premium, what 
are the chances that it will lose money in a given year? Use the normal and 
lognormal approximations. 


The mean and variance of the aggregate losses for the group are 


14 
E(S) = X bjaj = 2,054.41 
j=l 
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and 


14 
Var(S) = X b3q;(1 — qj) = 1.02534 x 10°. 
j=l 
The premium being charged is 1.45 x 2,054.41 = 2,978.89. For the normal 
approximation (in units of 1,000), the mean is 2.05441 and the variance is 
102.534. Then the probability of a loss is 


Pr(S > 2.97889) p E 2.97889 — | 


(102.534)1/2 
Pr(Z > 0.0913) 
0.46 or 46%. 


Il 


For the lognormal approximation (as in Example 6.5) 


ut do? = In2.05441 = 0.719989 
and 
2u + 207 = In(102.534 + 2.054417) = 4.670533. 
From this u = —0.895289 and o? = 3.230555. Then . 


Pr(S > 2.97889) = 1-6 | 
= 1~&(1.105) 
= 0.13 or 13%. o 


pi the next subsection we present several ways of obtaining the exact dis- 
tribution of § for the case where the benefit amounts are fixed. 


6.11.2 Exact calculation of the aggregate distribution 


6.11.2.1 Direct calculation The pf of aggregate losses is given by 


fs(a) = fx, * fx #-++ * fx, (2), (6.38) 
where 


= pj =1— qj, xz =0, 
fx; (2) { qj T = bj. 


The density (6.38) can be calculated recursively over the partial sums 5; = 
Sj-1 + Xj for j = 2,3,...,n beginning with Sı = X;. Then 


i n fs;_.(2) fx; (0), Tz < by, 
fale) = | O ea 82 hy 
5 { Pjfs; (2), T< bj, 

Pi fsi- (2) + gi fs; (€ — bj), z > bj. 
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Table 6.19 Cumulative probabilities for Example 6.31 


T Fs(x) 

60 0.99933062 
61 0.99933187 
62 0.99933191 
63 0.99933193 
64 0.99933198 
65 0.99933202 
66 0.99933206 
67 0.99933209 
68 0.99933217 
69 0.99933450 
70 0.99934141 
71 0.99934796 
72 0.99935031 
73 0.99936659 
74 0.99937973 
75 0.99941735 
76 0.99944759 
77 0.99945823 
78 0.99953355 
79 0.99956734 


Fs(z) 
0.95273905 
0.95273905 
0.95273905 
0.95273905 
0.95273905 
0.95273905 
0.95273905 
0.95273905 
0.95273905 
0.95273905 
10 0.95273905 
il 0.95273905 
12 0.95273905 
13 0.95273905 
14 0.95321566 
15 0.95463736 
16 0.95599217 
17 0.95646878 
18 0.95984386. 
19 0.96035862 


20 0.96157969 
21 0.96157969 
22 0.96157969 
23 0.96157969 
24 0.96621337 
25 0.96621337 
26 0.96998201 
27 0.96998201 
28 0.97114577 
29 0.97114648 
30 0.97212950 
31 0.97330507 
32 0.97330747 
33 0.97331344 
34 0.97331962 
35 0.97332386 
36 0.97332585 
37 0.97332829 
38 0.97333493 
39 0.97334251 


0.97335098 
41 0.97335892 
42 0.97338128 
43 0.97338740 
44 0.97340884 
45 0.97341351 
46 0.97342561 
47 0.97342840 
48 0.97343397 
49 0.97343866 
50 0.97345889 
51 0.97346040 
52 0.97346606 
53 0.97346608 
54 0.97347547 
55 0.97806678 
56 0.97807068 
57 0.97807536 
58 0.97807660 
59 0.97807808 


wmonoakwnr oja 


If we wish to calculate the distribution of total claims up to some value r, 
the computer time involved, as measured by the number of multiplications, 
can be seen to be of order nr. If both r and n are large (e.g., r = 10,000 and 
n = 10,000), the number of computations can be prohibitive. 


Example 6.31 (Example 6.30 continued) Use the direct method to determine 
the pf of S as well as the probability required in Example 6.30. 


fs,(0) = 0.99851, 
fs,(15) = 0.00149, 
fs.(0) = p2fs,(0) = 0.99709212, 
fsa(15) = pafs,(15) = 0.00148788, 
fsa(16) = pofs,(16) + g2fs, (0) = 0-00141788, 
fs.(31) = pefs,(31) + aefs, (15) = 0.0000021158. 
The final values of the cdf Fs(z) for x = 0,...,79 are given in Table 6.19. 


From Table 6.19, the probability of exceeding 2,978.89 is seen to be 0.047, 
which shows that both approximations in Example 6.30 are poor. Oo 


This approach is reasonable when n is not too large, but for larger groups 
an alternative method is needed. 
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6.11.2.2 Recursive calculation The following approach allows for the distri- 
bution to be calculated recursively based on De Pril [26]. We first divide the 
portfolio into subportfolios according to policy size and claim probability. Let 
nij be the number of policies with benefit i (where i = 1,2,... r)? and claim 


probability q; (where j = 1,2,...,m). Then the pgf of total claims may be 
written as 


Ps(z)=[[ [[0 - y + g). 


i=1 j=1 
The logarithm of the pgf is 
ln Ps(z) = D Xni In(1 — qj + qz’). (6.39) 
i=l j=1 


We now differentiate (6.39) to obtain 


Ps(z) = Ps(z) |J > ina gN]: (6.40) 


i=1 j=1 
Setting z = 1 in (6.40) yields the mean of the total claims distribution, namely 
E(S) = Ps(1) = $ J igna. 
i=1 j=l 
Now, (6.40) may be rewritten as 


r 


An S Siny (72 z) (1+ qj A)" 


i=1 j=1 qi t= gj 


2P5(2z) 


ll 


rom co k 
Ps(2) ines —1)F-1 qj zik 
(z Y Y ins > ) (he = =) (6.41) 


i=1 j=1 


for |z| < min; jla7 (1 — q;)|'/*. The second term on the right-hand side of 
(6.41) may be rewritten as 


where 


m k 
h(i, k) i i(—1)®7t domi (z a ) P (6.42) 


dj 


As in the discretization of severity distributions, it is necessary that the benefit amounts 
be in arithmetic progression. However, the monetary unit need not be 1. For example, 
i=1,2,... could represent benefit amounts of 5,000, 10,000,... . 
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Thus, (6.41) may be written as 


T 


zP4(z) = Ps(z) » Ss h(i, ne (6.43) 


i=l k=1 


The coefficient of z7 on the left-hand side of (6.43) is xfs(x), where fs(z) is 
the coefficient of z” in Ps(z). The right-hand side of (6.43) is a convolution, 
and the coefficient of z7 is thus given by 


5O h(i, k) fs (2 — ik). (6.44) 
tkSx 

A simpler way of writing (6.44) is 
x |z/i] 


Yo De ai k)fsle — ik) 

i=1 k=1 
where |-| denotes the greatest integer function, that is, the largest integer 
that is less than or equal to the argument. Finally, because h(i, k) = 0 if 
i > x, one may equate coefficients of z7 on both sides of (6.43) and divide by 
x to obtain 


ar |2/i] 
fsa) =F Yo Mikio —ik), s21 (6.45) 
i=l k=1 
Now, its 
fs(0) = Ps(0) = [[ [[G-a)™": (6.46) 


i=1 j=1 
and from (6.45), 


fs(1) 
fs(2) 


h(1, 1) fs(0), 
HAC, 1) fs(1) + [a(1,2) + A(2, D] fs(0)}, 


Il 


ll 


The probabilities {fs(x); z = 1,2,.. .} may be calculated recursively using 
(6.45), beginning with (6.46). 

It can be seen from (6.42) that h(i, k) is a weighted sum of the kth power 
of q;/(1— q), j =1,2,-..,m. When qj is close to zero, [q;/(1—4;)]* is small. 
Consequently, the magnitude of h(i, k) decreases rapidly as k increases. This 
suggests that the inner summation in (6.45) can be limited to a small number 
of terms without significant loss of accuracy in computations while speeding 
up the computations considerably. 

If we limit k to a maximum of K terms, let 


zar KA^la/ij] 


WFE E aae- i) (647) 


i=l k=l 
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Table 6.20 Values of ni; for Example 6.32 


a 
1000q; 14 15 16 17 18 19 20 24 26 28 30 31 55 60 


J 

1 050 1 0 0 1 0 0 0 0 0 0 0 0 0 0 
2 054 0 0 0 0 0 1 0 0 0 0 0 0 0 Ọ 
3 103 0 0 0 0 0 0 0 0 0 0 1 0 0 0 
4 122 0 0 0 0 0 0 0 0 0 1 0 0 0 0 
5 123 0 0 0 0 0 0 0 0 0 0 0 1 0 0 
6 1.28 0 0 0 0 0 0 1 0 0 0 0 0 0 0 
7 142 0 0 1 0 0 0 0 0 0 0 0 0 0 0 
8 149 0 1 0 0 0 0 0 0 0 0 0 0 0 0 
9 3.53 0 0 0 0 1 0 0 0 0 0 0 0 0 0 
10 394 0 0 0 0 0 0 0 0 1 0 0 0 0 Q0 
11 479 0 0 0 0 0 0 0 0 0 0 0 0 10 
12 484 0 0 0 0 0 0 0 1 0 0 0 0 0 0 
13 21.82 0 0 0 0 0 0 0 0 0 0 0 0 0 #1 


denote the approximation using at most K terms in (6.45). In a later paper, 
De Pril [27] shows that, if q; < 4, j =1,2,...,m, then 


M 


E lfs(2) - FF @)| < 29 — 1, (6.48) 
xz=0 A 
where 
1 r m = qj qj K+1 a 
§(K) = —— ” ; . 
g9 Rei "I (4) (6.49) 


and M = Ð; Xj ina is the maximum possible aggregate claim amount. 

The value of 6(K) is easily calculated for any value of K. Equation (6.48) 
provides an upper bound on the sum of the absolute errors over the entire 
distribution of aggregate claims and can be used to guarantee accuracy of 
results when a limited number of terms is used in (6.47). 


Example 6.32 (Example 6.30 continued) Determine the exact distribution 
of aggregate losses using the recursive method. 


The values of q; and the nonzero rows of the matrix of ni; values are given 
in Table 6.20. 

Using (6.49), we find that 6(1) = 5.947 x 1074, 6(2) = 3.900 x 1076, 
6(3) = 6.369 x 1078, and 6(4) = 1.131 x 107%. Hence, (6.47) with K = 4 
will give us about 8 decimal place accuracy. The (nonzero) values of h(i, k) 
computed using (6.42) are given in Table 6.21. 

The values of fs(x) and the associated cdf Fs(x) calculated using (6.47) 
with K = 4 and (6.46) are given in Table 6.22. 


200 


h(i, k) : 
i k=l k=2 k=3 k=4 i 
14 7.0035018x1073 -3.5035025x1076 1.7526276x107° -8.7675218x10~** 
15 2.2383351x1072 -3.3400962x107ë 4.9841694x1078 -7.4374944x107" 
16 2.2752309x1072? -3.2354221x107ë 4.6008325x1078 -6.5424723x107"4 
17 8.5042522x1073 -4.2542531x107 2.1281907x107°? -1.0646277x10~ 
18 6.3765090x1072 —2.2588816x10~* 8.0020991x1077 -2.8347477x107° 
19  1.0265543x1072 -5.5463884x107ê 2.9966680x107° -1.6190750x10~™ 
90 2.5632810x1072 -3.2852048x107ë + 4.2104515x107® -5.3962851x107" 
94 1.1672495x107! -5.6769638x1074  2.7610139x10~® -1.3428300x10~* 
96 1.0284521x107! -4.0681297x10~4 1.6091833x107 -6.3652612x10~° 
98 3.4201726x1072  -4.1777074x107> 5.1030287x1078 -6.2332996x10~** 
30 3.0931860x1072 -3.1892665x10-> 3.2883315x1078 + -3.3904736x107"? 
31 3.8176959x1072 ~—4.7015487x10-> 5.7900266x1078 -7.1305033x107* 
55 2.6471800x107! -1.2741022x1073 6.1323232x107 -2.9515207x1078 
60 1.3384040 -2.9855420x107?  6.6597689x107* -1.4855768x107" 
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Table 6.21 Values of h(i, k) for Example 6.32 


It can be seen that the values in the last column of this table are identical 
to the corresponding values from Example 6.31 based on the direct method. 
This method is especially useful when there are a large number of lives in a 


group insurance contract. 


oO 


Table 6.22 


Aggregate probabilities for Example 6.32 


0.95273905 
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6.6436439x 107° 


F(z) 
0.97333493 


14 4.7660783x10-* 0.95321566 | 39 7.5742253x10-® 0.97334251 
15  1.4216995x10-% 0.95463736 | 40 8.4744508x10-® 0.97335098 
16 1.3548133x10~ 0.95599217 | 41 7.9416543x1076 0.97335892 
17 4.7660783x1074 0.95646878 | 42 2.2356100x1075 0.97338128 
18 3.3750829x1073 0.95984386 | 43 6.1253961x1076 0.97338740 
19 5.1475706x107¢ 0.96035862 | 44 2.1435448x10-> 0.97340884 
20 1.2210690x1073 0.96157969 | 45 4.6721584x10-® 0.97341351 
24 4.6336840x1073 0.96621337 | 46 1.2100769x10-° 0.97342561 
26 3.7686403x1073 0.96998201 | 47 2.7915140x107 0.97342840 
28 1.1637614x1073 0.97114577 | 48 5.5621896x1075 0.97343397 
29 7.1120536x10-7 0.97114649 | 49 4.6964950x1076 0.97343866 
30 9.8301077x10~4 0.97212950 | 50 2.0226469x107ë 0.97345889 
31 1.1755723x1073 0.97330507 | 51 1.5103585x107 0.97346040 
32 2.3995910x107ë 0.97330747 | 52 5.6661624x107ë 0.97346607 
33 5.9716305x107 0.97331344 | 53 1.3705553x1078 0.97346608 
34 6.1784053x1076 0.97331962 | 54 9.3923168x107 0.97347547 
35 4.2424878x107 0.97332386 | 55 4.5913084x107 0.97806678 
36 1.9938909x1076 0.97332585 | 56  3.9003832x10-® 0.97807068 
37 2.4343694x107 0.97332829 | 57 4.6823253x10- 0.97807536 


Example 6.33 (An expanded group) For the purpose of illustrating the effect 
of the size of the portfolio, consider a portfolio consisting of 1,400 independent 
lives, with exactly 100 lives like each life in the group life portfolio of Example 
6.30. Determine the exact distribution of total losses. 


The values of n;; are now either 100 or 0. From (6.42), it can be seen that 
each h(i, k) is now 100 times larger than that of the previous example. The 
distribution of total claims can be computed as in the previous example. In 
Table 6.23 we give some values of the distribution function of total claims (as 
before, x is measured in units of 1,000). 

From Example 6.30, the mean and variance of total claims for this portfolio 
of 1,400 lives are p, = 205,441 and py = 1.0253356 x 101°. Also, the coefficient 
of skewness is calculated as ji3(u2)~9/? = 0.5267345, which is exactly one- 
tenth of the value of 5.267345 for the corresponding group of 14 lives. This 
indicates that the distribution is much more symmetric. Oo 


There are numerous papers on the subject of the individual risk model. De 
Pril [28] develops a generalization of this method to the case where the loss 
can take on more than one value. 


6.11.3 Compound Poisson approximation 


Because of the computational complexity of calculating the distribution of to- 
tal claims for a portfolio of n risks using the individual risk model, it has been 
popular to attempt to approximate the distribution by using the compound 
Poisson distribution. As was seen in Section 6.5, use of the compound Pois- 
son allows calculation of the total claims distribution by using a very simple 
recursive procedure or by using the Fast Fourier Transform. The pgf (6.35) 
of aggregate losses is 


Ps(z) = [Ie + q; (2° —1)). 


By taking logarithms and using a Taylor series expansion of In[1+9; (255 —1)], 
we obtain 


yer 


InPs(z)= >> co 


j=l k=1 


[a;(z’4 — 1)]*. 


202 


72 


104 
112 
120 
128 
136 
144 
152 
160 
168 
176 
184 
192 
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Table 6.23 Aggregate distribution for Example 6.33 


F(x) 

0.00789581 
0.00789581 
0.01059183 
0.01906263 
0.02561396 
0.02979597 
0.03774849 
0.04770335 
0.06976756 
0.07610528 
0.10013432 
0.12399672 
0.13839960 
0.15945674 
0.18004988 
0.22162539 
0.23569706 
0.26335063 
0.30172723 
0.33041683 
0.35497038 
0.38635038 
0.42177800 
0.45476409 
0.47881084 


0.51793382 
0.55208736 
0.57594360 
0.60583632 
0.63412023 
0.66687534 
0.68632432 
0.71292641 
0.73984823 
0.76061061 
0.78007667 
0.80016248 
0.82073766 
0.83672920 
0.85121374 
0.86849112 
0.88253629 
0.89343529 
0.90530546 
0.91610001 
0.92599796 
0.93340851 
0.94180939 
0.94903173 
0.95482932 


400 
408 
416 
424 
432 
440 
448 
456 
464 
472 
480 
488 
496 
504 
512 
520 
528 
536 
544 
552 
560 
568 
576 
584 


0.96031865 
0.96528977 
0.96999808 
0.97360199 
0.97694855 
0.98031827 
0.98297484 
0.98515668 
0.98728185 
0.98914069 
0.99068429 
0.99194854 
0.99321771 
0.99421982 
0.99504627 
0.99580992 
0.99644624 
0.99701494 
0.99745896 
0.99785787 
0.99821941 
0.99850204 
0.99873887 
0.99895067 
0.99912971 


0.99927281 
0.99939248 
0.99950270 
0.99958630 
0.99965590 
0.99971784 
0.99976679 
0.99980876 
0.99984205 
0.99987063 
0.99989481 
0.99991371 
0.99992950 
0.99994275 
0.99995364 
0.99996222 
0.99996932 
0.99997545 
0.99998004 
0.99998383 
0.99998707 
0.99998955 
0.99999162 
0.99999326 
0.99999461 


Retaining only the first term in the inner sum yields the approximation 


Table 6.24 Aggregate — for =o 6.34 
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T Fs(z) T F(z) 
0 0.9530099 0. ie 0. nee 0.9990974 
1 0.9530099 i 0.9618348 i 0.9735850 61 0.9990986 
2 0.9530099 22 0.9618348 42 0.9736072 62 0.9990994 
3 0.9530099 23 0.9618348 43 0.9736133 63 0.9990995 
4 0.9530099 24 0.9664473 44 0.9736346 64 0.9990995 
5 0.9530099 25 0.9664473 45 0.9736393 65 0.9990996 
6 0.9530099 26 0.9702022 46 0.9736513 66 0.9990997 
7 0.9530099 27 0.9702022 47 0.9736541 67 0.9990997 
8 0.9530099 28 0.9713650 48 0.9736708 68 0.9990998 
9 0.9530099 29 0.9713657 49 0.9736755 69 0.9991022 
10 0.9530099 30 0.9723490 50 0.9736956 70 0.9991091 
11 0.9530099 31 0.9735235 öl 0.9736971 71 0.9991156 
12 0.9530099 32 0.9735268 52 0.9737101 72 0.9991179 
13 0.9530099 33 0.9735328 53 0.9737102 73 0.9991341 
14 0.9534864 34 0.9735391 54 0.9737195 T4 0.9991470 
15 0.9549064 35 0.9735433 55 0.9782901 75 0.9991839 
16 0.9562597 36 0.9735512 56 0.9782947 76 0.9992135 
17 0.9567362 37 0.9735536 57 0.9782994 77 0.9992239 
18 0.9601003 38 0.9735604 58 0.9783006 78 0.9992973 
19 0.9606149 0.9735679 0.9783021 0.9993307 


This distribution has pf 
1 
Pr(X =2) = So Ay. 


The numerator sums all probabilities associated with amount bz. 


{j:bj;=a} 


Xg — 1) 


j=l 


A 
anen ISA 
= 23 X (z 


InPs(z) = 


bi — 1), (6.50) 


where À; = q; and A= }4_, àj- This results in 


Ps(z) = exp{A[Px(z) — 1l}, 


which is the pgf of a compound Poisson distribution with individual loss dis- 
tribution pgf l 


Note that the means of the frequency distribution and the aggregate loss 


‘distribution match those of the exact distribution. 


Example 6.34 (Example 6.30 continued) Consider the group life case of Ex- 
ample 6.30. Derive a compound Poisson approzimation. 


Using the compound Poisson approximation of this section with Poisson 
parameter A = 5° q; = 0.04813, the distribution function given in Table 6.24 
is obtained. 

When these values are compared to those of Example 6.31, it can be seen 
that the maximum error of 0.0002708 occurs at x = 0. o 


Some closely related approximations have been used. One popular one is 
to let A; in (6.50) be set to 
Aj =- h(l — gy), 


g=1,2,...,n (6.53) 


(6.51) 


This matches the no-loss probability 1 — q; with the no-loss probability of the 
Poisson distribution, e~*/. This effectively replaces each life in the group by 


1 n 
eres 
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a Poisson distribution. This approximation is appropriate in the context of a 
group life insurance contract where a life is “replaced” upon death, leaving the 
Poisson intensity unchanged by the death. Naturally the expected number of 
losses is greater than ya qj- An alternative choice was proposed by Kornya 
[79]. It used Ay = qj/(1 — gy) in (6.50). It results in an expected number of 
losses that exceeds that using (6.53) (see Exercise 6.61). 


6.11.3.1 More than one possible loss amount It was noted in the beginning 
of this section that there may be more than one possible loss amount. Again 
let B; be the random variable that measures the amount of the loss given that 
there was a loss and let X; = 1;B;. Then 


Px,(z) = [1 — a; + Pa, (2). 
The pgf corresponding to (6.35) is 


n 


Ps(z) = [ [E - g; + Pa, (2). 


jal 


Although it is possible to extend the exact computational methods to this 
case, it is quite cumbersome. However, the compound Poisson approximation 
_ based on matching of moments (6.50) simply requires replacing zti by Pp,(z)- 
Then, the pef of the severity distribution (6.51) becomes 


Px(2) = +) NPB (2) 
j=1 


and so 


fx(z) = 5 > Afal) (6.54) 


which is a weighted average of the n individual severity densities. Extensions 
to continuous severity distributions also satisfy (6.54) for all values of z. 


Example 6.35 (Example 6.29 continued) Develop compound Poisson ap- 
proximations using all three methods suggested here. Compute the mean and 
variance for each approximation and compare it to the exact value. 


Using the method that matches the mean, we have A = 50(0.01)-+25(0.01) = 
0.75. The severity distribution is 


fx(50,000) = sewan = 0.4667, 
fx(T5,000) = OOD = 0.2333, 
fx(100,000) = soana = 0.2000, 
fx (150,000) 25(0.01)(0-3) _ 9 1000. 


0.75 
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The mean is \E(X) = 0.75(75,833.33) = 56,875, which matches the exact 
value, and the variance is AE(X?) = 0.75(6,729,166,667) = 5,046,875,000, 
which exceeds the exact value. 

For the method that preserves the probability of no losses, \ = —75 In(0.99) 
= 0.753775. For this method, the severity distribution turns out to be exactly 
the same as before (this is due to the fact that all individuals have the same 
value of g;). Thus the mean is 57,161 and the variance is 5,072,278,876, both 
of which exceed the previous approximate values. 

Using Kornya’s method, \ = 75(0.01)/0.99 = 0.757576 and again the 
severity distribution is unchanged. The mean is 57,449 and the variance is 
5,097,853,535, which are the largest values of all. | 


6.11.4 Exercises 
6.60 Derive (6.36) and (6.37). 


6.61 Demonstrate that the compound Poisson model given by A; = qj and 
(6.52) produces a model with the same mean as the exact distribution but 
with a larger variance. Then show that the one using A; = —In(1 — q;) must 
produce a larger mean and even larger variance, and finally show that the one 
using à; = q;/(1 — qj) must produce the largest mean and variance of all. 


6.62 Individual members of an insured group have independent claims. The 
claim distribution has the statistics given in Table 6.25. 

The premium for a group with future claims S is the mean of S plus 2 times 
the standard deviation of S. If the genders of the members of a group of m 
members are not known, the number of males is assumed to have a binomial 
distribution with parameters m and q = 0.4. Let A be the premium for a 
group of 100 for which the genders of the members are not known and let B 
is the premium for a group of 40 males and 60 females. Determine A/B. 


6.63 An insurance company assumes claim probabilities for persons covered 
by its group life insurance contracts as given in Table 6.26. 

A group of mutually independent lives has coverage of 1,000 per life. The 
company assumes that 20% of the lives are smokers. Based on this assumption, 
the premium is set equal to 110% of expected claims. If 30% of the lives are 
smokers, the probability that claims will exceed the premium is less than 


Table 6.25 Data for Exercise 6.62 


Mean Variance 
Males 2 4 
Females 4 10 
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Table 6.26 Data for Exercise 6.63 


Probability 
Class of claim 
Smoker 0.02 
Nonsmoker 0.01 


0.20. Using the normal approximation, determine the minimum number of 
lives which must be in the group. 


6.64 Based on the individual risk model with independent claims, the cumu- 
lative distribution function of aggregate claims for a portfolio of life insurance 
policies is as in Table 6.27. 

One policy with face amount 100 and probability of claim 0.20 is increased 
in face amount to 200. Determine the probability that aggregate claims for 
the revised portfolio will not exceed 500. 


6.65 A group life insurance contract covering independent lives is rated in 
the three age groupings as given in Table 6.28. The insurer prices the contract 


Table 6.27 Distribution for Exercise 6.64 


x F(x) 
0 0.40 
100 0.58 
200 0.64 
300 0.69 
400 0.70 
500 0.78 
600 0.96 
700 1.00 


Table 6.28 Data for Exercise 6.65 
Mean of the 


Probability exponential 
Age Number in of claim distribution 
group age group per life of claim amounts 


18-35 ` 400 0.03 5 
36-50 300 0.07 
51-65 200 0.10 


bo w 
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Table 6.29 Data for Exercise 6.66 


Distribution of annual charges 


Probability given that a claim occcurs 
Service of claim Mean Variance 
a A eee te at a eS a a a 
Office visits 0.7 160 4,900 
Surgery 0.2 600 20,000 
Other services 0.5 240 8,100 


aaee 


so that the probability that claims will exceed the premium is 0.05. Using the 
normal approximation, determine the premium that the insurer will charge. 


6.66 The probability model for the distribution of annual claims per member 
in a health plan is shown in Table 6.29. Independence of costs and occurrences 
among services and members is assumed. Using the normal approximation, 
determine the minimum number of members that a plan must have such that 
the probability that actual charges will exceed 115% of the expected charges 
is less than 0.10. 


6.67 An insurer has a portfolio of independent risks as given in Table 6.30. 
The insurer sets œ and k such that aggregate claims have expected value 
100,000 and minimum variance. Determine a. 


6.68 An insurance company has a portfolio of independent one-year term life 
policies as given in Table 6.31. The actuary approximates the distribution of 
claims in the individual model using the compound Poisson model in which the 
expected number of claims is the same as in the individual model. Determine 


Table 6.30 Data for Exercise 6.67 


Probability Number 
Class of claim Benefit of risks 
Standard 0.2 k 3,500 


Substandard 0.6 ak 2,000 


Table 6.31 Data for Exercise 6.68 


Number Benefit Probability 
Class in class amount of a claim 
1 500 x 0.01 
2 500 22 0.02 


ree 
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Table 6.32 Data for Exercise 6.69 


i oe nf linac’ 
Class Benefit amount Probability of death Number of policies 
1 100,000 0.10 500 
2 200,000 0.02 500 
3 300,000 0.02 500 
4 200,000 0.10 300 
5 200,000 0.10 500 


the maximum value of x such that the variance of the compound Poisson 
approximation is less than 4,500. 


6.69 An insurance company sold one-year term life insurance on a group of D tS crete tym € 
2,300 independent lives as given in Table 6.32. ruin m, ode ls 


The insurance company reinsures amounts in excess of 100,000 on each life. 
The reinsurer wishes to charge a premium that is sufficient to guarantee that 
it will lose money 5% of the time on such groups. Obtain the appropriate 
premium by each of the following ways. . 


(a) Using a normal approximation to the ageregate claims distribution. 7.1 INTRODUCTION 


(b) Using a lognormal approximation. The risk assumed with a portfolio of insurance contracts is difficult to assess, 
but it is nevertheless important to attempt to do so in order to ensure the 
viability of an insurance operation. The distribution of total claims over a 
fixed period of time is an obvious input parameter to such a process, and this 
quantity has been the subject of the previous chapters. 

In this chapter we take a multiperiod approach in which the fortunes of 
the policy, portfolio, or company are followed over time. The most common 
use of this approach is ruin theory, in which the quantity of interest is the 
amount of surplus, with ruin occurring when the surplus becomes negative. 
In order to track surplus we must model more than the claim payments. We 
must include premiums, investment income, and expenses, along with any 
other item that impacts the cash flow. 

The models described in this and the next chapter are quite simple and 
idealized in order to maintain mathematical simplicity. Consequently the 
output from the analysis should not be viewed as a representation of absolute 
reality, but rather as important additional information on the risk associated 
with the portfolio of business. Such information is useful for long-run financial 
planning and maintenance of the insurer’s solvency. 


(c) Using a gamma approximation. 
(d) Using the compound Poisson approximation which matches the 
means. 


(e) Carrying out the calculations exactly (using the method developed 
by De Pril or some other method). This requires a computer. 


Loss Models: From Data to Decisions, Second Edition. 
By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 
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This chapter is organized into two parts. The first part (Section 7.2) in- 
troduces process models. The appropriate definitions are made and the terms 
of ruin theory defined. The second part (Section 7.3) analyzes discrete-time 
models. This can be done with the tools presented in the previous chapters. 
An analysis of continuous-time models is covered in the next chapter. This 
requires an introduction to stochastic processes. Two processes are analyzed: 
the compound Poisson process and Brownian motion. The compound Pois- 
son process has been the standard model for ruin analysis in actuarial science, 
while Brownian motion has found considerable use in modern financial theory 
and also can be used as an approximation to the compound Poisson process. 


Total loss 
N 
oO 
3 
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Fig. 7.1 Continuous total loss process, Sz- 
7.2.1 Processes : l 


The major difference between this chapter and the earlier ones is that we now 
want to view the evolution of the portfolio over time. With that in mind, 
we define two kinds of processes. We note that, while processes that involve 
random events are usually called stochastic processes, we will not employ 
the modifier “stochastic” and instead trust that the context will make it clear 
which processes are random and which are not. 


, This property indicates that the movement in the process in any one period 
is independent of the movement in a different, nonoverlapping period. 


Definition 7.4 A process has stationary increments if the distribution of 
Xı— X, depends on t and s only through the difference t — s. 


This property implies that the movement does not depend on the date. In 
other words, you cannot tell what time it is by looking at increments in the 
process. ' 

Most business organizations do not continuously monitor their status. In- 
stead, it is checked at regular intervals. This leads to the other kind of process. 


Definition 7.1 A continuous-time process is denoted {Xnt >= O}. If 
there are random elements, it is sufficient to specify the joint distribution of 
(Xni Ktn) for all ti,..-,tn and any n. 


In general, it is insufficient to describe the process by specifying the distri- 
bution of X; for arbitrary t. Many processes have correlations between the 
values observed at different times. 


Definition 7.5 A discrete-time process is denoted by {X;;t = 0,1,2,.. ahs 
If there are random elements, it is sufficient to specify the joint distribution 


Xe ou j ; 
Example 7.2 Let {S4 t 2 0} be the total losses paid from time 0 to time t. of (Xtys+++»Xtq) for integer t; and any n. 


Indicate how the collective risk model of Chapter 6 may be used to describe 
this process. 


A discrete-time process can be derived from a continuous-time process by 
just writing down the values of X; at integral times. In this chapter, all 
discrete-time processes will take measurements at the end of each observation 


For the joint distribution of (5;,,---,5:,), Suppose tr <- < tn. Let i 
l (Strs-+-5tq)s SUPPOSE fı i period, such as a month, quarter, or year. 


W; = St, — Stj with Ste = So = 0. Let the W; have independent distri- 
butions given by the collective risk model. The individual loss distributions 
could be identical while the frequency distribution would have a mean that 
is proportional to the length of the time period, tj — t46 An example of a 
realization of this process (called a sample path) is given in Figure 7.1. m 


Example 7.6 (Example 7.2 continued) Convert the process to a discrete time 
process with stationary, independent increments. 


Let X1, X2,... be the amount of the total losses in each period where the 
Xjsarei.id. and each X; has a compound distribution. Then let the total loss 
process be S; = X,-+---+X;. The process has stationary increments because 
S:—Ss = X41 +- + Xz and its distribution depends only on the number of 
X;s, which is t — s. The property of independent increments follows directly 
from the independence of the X;s. Figure 7.2 is the discrete-time version of 
Figure 7.1. Oo 


It is usually easier to describe a process if it does not change much over 
time. Two specific ways in which this can happen are defined below. 


Definition 7.3 A process has independent increments if the random vari- 
ables Xi — Xs and Xu — Xv are independent whenever s <t <v <u. 
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Fig. 7.2 Discrete total loss process, 5;. 


7.2.2 An insurance model 


The earlier examples have already illustrated most of the model we will use for 
the insurance process. We are interested in the surplus process {U;; t > 0} 
(or perhaps its discrete-time version, {U;;t = 0,1,...}), which measures the 
surplus of the portfolio at time t. We begin at time zero with u = Up, the 
initial surplus. We think of the surplus in the accounting sense in that it 
represents excess funds that would not be needed if the portfolio terminated 
today. For an ongoing concern, a positive value provides protection against 
adversity. The surplus at time t is 


Ui = Uo + Fi — Si, 


where {P,;t > 0} is the premium process which measures all premiums (net 
of expenses) collected up to time t, and {S;;¢ > 0} is the loss process, which 
measures all losses paid up to time t. We make the following observations: 


1. P; may be written or earned premiums, as appropriate; 
2. Sı may be paid or incurred losses, again, as appropriate; and 


3. P, may depend on S, for u < t. For example, dividends based on 
favorable past loss experience may reduce the current premium. 


It is possible, though not necessary, to separate the frequency and severity 
components of S+. Let {N:;t > 0} be the claims process which records the 
number of claims as of time t. Then let S; = X; +---+ Xy,. The sequence 
{X1, X2,...} need not consist of iid. variables. However, if it does and 
the sequence is independent of N; for all t, then S; will have a compound 
distribution. 

We now look at two special cases of the surplus process. These are the only 
ones that will be studied. 
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7.2.2.1 A discrete-time model Let the increment in the surplus process in 
year t be defined as 


W: = RP, — Pit — Si + S1, t= 1,2,.... 
Then the progression of surplus is 
U: = U1 +W, t= Jya i 


It will be relatively easy to learn about the distribution of {U;;t = 1,2,...} 
provided that the random variable W; either is independent of the other Ws 
or only depends on the value of U;-1. The dependency of W: on U;_; allows 
us to pay a dividend based on the surplus at the end of the previous year 
(because W; depends on P,). 

In Section 7.3, two methods of determining the distribution of the U;s will 
be presented. These will be computationally intensive, but given enough time 
and resources, the answers are easy to obtain. 


7.2.2.2 A continuous-time model In most cases it is extremely difficult to 
analyze continuous-time models. This is because the joint distribution must 
be developed at every time point, not just at a countable set of time points. 
One model that has been extensively analyzed is the compound Poisson claim 
process where premiums are collected at a constant continuous nonrandom 
rate, 
P; = (1+ @)E(Si)t, 

and the total loss process is S; = X; +---+Xwy,, where {Nit > 0} is the 
Poisson process. This process is discussed in detail in Section 8.1.1. It suffices 
for now to note that for any period the number of losses has the Poisson 
distribution with a mean that is proportional to the length of the time period. 

- Because this model is more difficult to work with, the entire next chapter 
will be devoted to its development and analysis. We are now ready to de- 
fine the quantity of interest, the one that measures the portfolio’s chance for 
success. 


7.2.3 Ruin 


The main purpose for building a process model is to determine if the portfolio 
will survive over time. The probability of survival can be defined in four 
different ways. 


Definition 7.7 The continuous-time, infinite-horizon survival prob- 
ability is given by 


o(u) = Pr(U; > 0 for all t > 0|Uo = u). 


Here we continuously check the surplus and demand that the portfolio 
be solvent forever. Both continuous checking and a requirement that the 
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portfolio survive forever are unrealistic. In practice, it is more likely that 
surplus is checked at regular intervals. While we would like our portfolio to 
last forever, it is too much to ask that our model be capable of forecasting 
infinitely far into the future. A more useful quantity follows. 


Definition 7.8 The discrete-time, finite-horizon survival probability 
is given by 


d(u, T) = Pr(U; > 0 for all t = 0, 1,...,7|/Uo = u). 


Here the portfolio is required to survive 7 periods (usually years), and we 
only check at the end of each period. There are two intermediate cases. 


Definition 7.9 The continuous-time, finite-horizon survival probabil- 
ity is given by 


ġ(u, T) = Pr(U; > 0 for all 0 < t < T|Uo = u) 
and the discrete-time, infinite-horizon survival probability is given by 
o(u) = Pr(U; > 0 for allt =0,1,... |Uo = u). 
It should be clear that the following inequalities hold: 
G(u,T) > d(u) > (u) 


and 7 
plu, T) = O(u,7) = o(u). 
There are also some limits that should be equally obvious. They are 
Jim 4(u,7) = 6(u) 
and 7 E 
Jim 6(u,7) = blu). 


In many cases, convergence is rapid. This means that the choice of a finite 
or infinite horizon should depend as much on the ease of calculation as on 
the appropriateness of the model. We will find that, if the Poisson process 
holds, infinite-horizon probabilities are easier to obtain. For other cases, the 
finite-horizon calculation may be easier. 

Although we have not defined notation to express them, there is another 
pair of limits. As the frequency with which surplus is checked increases (that 
is, the number of times per year), the discrete-time survival probabilities con- 
verge to their continuous-time counterparts. 

As this subsection refers to ruin, we close by defining the probability of 
ruin. 

Definition 7.10 The continuous-time, infinite-horizon ruin probabil- 
ity is given by 

v(u) =1-— ġ(u) 
and the other three ruin probabilities are defined and denoted in a similar 
manner. 
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7.3 DISCRETE, FINITE-TIME RUIN PROBABILITIES 


7.3.1 The discrete-time process 


Let P; be the premium collected in the tth period and let S; be the losses 
paid in the tth period. We also add one generalization. Let C; be any cash 
flow other than the collection of premiums and the payment of losses. The 
most significant cash flow is the earning of investment income on the surplus 
available at the beginning of the period. The surplus at the end of the tth 
period is then 


t 
U: = u+ > (Pj) + Cj — $3) = Ur- + Pi + Ci — S+. 


j=1 


The final assumption is that, given U;_1, the random variable W; = Pi+C:—S 
depends only upon U;_1 and not upon any other previous experience. This 
makes {U;;t = 1,2,...} a Markov process. 

In order to evaluate ruin probabilities, we consider a second process defined 
as follows. First, define 


* pie 0, UE y < 0, 
Wo = { WU 20: 
Uy = Ux,+Wy, . (7.1) 


where the new process starts with Uj = u. In this case, the finite-horizon 
survival probability is 
o(u,7) = Pr(UZ > 0). 


The reason we need only check Už at time 7 is that, once ruined, this process 
is not allowed to become nonnegative. The following example illustrates this 
distinction and is a preview of the method presented in detail in the next 
subsection. 


Example 7.11 Consider a process with an initial surplus of 2, a fixed annual 
premium of 3, and losses of either 0 or 6 with probabilities 0.6 and 0.4, 
respectively. There are no other cash flows. Determine @(2, 2). 


There are only two possible values for U}, 5 and —1, with probabilities 0.6 
and 0.4. In each year, W; takes the values 3 and —3 with probabilities 0.6 and 
0.4. For year 2, there are four possible ways for the process to end. They are 
listed in Table 7.1. Then, 6(2,2) = 0.36 + 0.24 = 0.60. Note that for U2 the 
process would continue for cases 3 and 4, producing values of 2 and —4. But 
our process is not allowed to recover from ruin and so case 3 must be forced 
to remain negative. o 
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Table 7.1 Calculations for Example 7.11 


Case Ui Wo ws Us Probability 
1 5 3 3 8 0.36 
2 5 —3 —3 2 0.24 
3 —1 3 0 —1 0.24 
4 —1 —3 0 =1 0.16 


7.3.2 Evaluating the probability of ruin 


There are three ways to evaluate the ruin probability. One way that is al- 
ways available is simulation. Just as the aggregate loss distribution can be 
simulated, the progress of surplus can also be simulated. For extremely com- 
plicated models (for example, one encompassing medical benefits, including 
hospitalization, prescription drugs, and outpatient visits as well as random 
inflation, interest rates, and utilization rates), this may be the only way to 
proceed. For more modest settings the other two methods will work well. The 
first is a brute force method that has few restrictions, and the second is an 
inversion method that has some restrictions. 


7.3.2.1 Evaluation by convolutions For any practical use of this method, the 
distributions of all the random variables involved should be discrete and have 
finite support. If they are not, some discrete approximation should be con- 
structed. The calculation is done recursively, using (7.1). For notational pur- 
poses, suppose we have obtained the discrete pf of Uj_,. Then the ruin proba- 
bility is ý(u,t—1) = Pr(Už < x and the distribution of nonnegative surplus 
is fj = Pr(Uf_, = uj), j = 1,2,...,n, where uj > 0 for all j and u, is the 
largest possible value of U7_,. We have assumed that for each positive value 
of Už, the distribution of W; is known. Let gje = Pr(Wi = wj,n|U¢_1 = u;)- 
We have left open the possibility that even the values W, may depend on u;. 
We then obtain the probabilities of U¥ by convolution. First, 


w(u, t) w(u,t —1)+Pr(Uz_, > 0 and Už; + W: < 0) 


= P(ut—-1)+ >> Pr(U} i +W: < OU, = uy) Pr(ULy = u) 
j=l 


= W(u,t—1)+ J Pr(uj + Wi < OU, = us) fi 
ie 


= plu, t— 1) aS 5 Gib Fj. 


j=l Wj, < Uj 
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Then, 


Pr(U; =z) Pr(Uj_, > 0 and UŽ: + W; = £) 


li 


nr 
= J Pr(UŻ > 0 and UL, + W; =2|Uh, = uy) 
j=l 
x Pr(Uj_1 = ug) 


n 
= X Pr(uj +W; = s|Uf i = uy) fy 
j=l 


= ye 5 Ijk fj- 


j=1 wj,ktUuj=T 


Although these formulas look a bit intimidating, they are fairly easy to 
implement. Consider the following example. 


Example 7.12 Suppose that annual losses can assume the values 0, 2, 4, and 
6, with probabilities 0.4, 0.3, 0.2, and 0.1, respectively. Further suppose that 
the initial surplus is 2, and a Drean a 2.5 is collected at the beginning of 
each year. Interest is earned at 10% on any surplus available at the beginning 
of the year because losses are paid at the end of the year. In addition, a 
rebate of 0.5 is given in any year in which there are no losses. Determine the 
survival probability at the end of each of the first two years. 


First note that the rebate cannot be such that it is applied to the next 
year’s premium. Doing so would require that we begin the year not only 
knowing the surplus but also if a rebate was to be provided. 

At time zero, #(2,0) = 0 and fı = Pr(Uj = 2) = 1. The possible values of 
w; are given in Table 7.2 along with the probabilities, g1,k- 

For example, w1, is based on a premium of 2.5, interest of 0.45 (on the 
surplus after collection of the premium), a loss payment of 0, and a rebate of 
0.5. To evaluate v2, 1), observe that the only value of wi, which is below 
—u, = —2 is wi,4 and so w(2,1) = 0.1. It is also easy to see that the only 
values of z that will have positive probability are those that have 2+wy,, > 0. 
This gives the values for the distribution of Uj as in Table 7.3. 


Table 7.2 w and g for Example 7.12 


k Wik Ji,k 
1 2.45 0.4 
2 0.95 0.3 
3 —1.05 0.2 
4 —3.05 0.1 
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Table 7.3 Uf for Example 7.12 


j Uy Íi 
1 0.95 0.2 
2 2.95 0.3 
3 4.45 0.4 


Table 7.4 u+ w and g for Example 7.12 


: k 
j Uj fi 1 2 3 4 
0.95 0.2 3.295, 0.4 1.795, 0.3 —0.205, 0.2 —2.205, 0.1 


1 
2 2.95 0.3 5.495,0.4 3.995, 0.3 1.995,0.2  —0.005, 0.1 
3 445 O04 7.145,0.4 5.645, 0.3 3.645, 0.2 1.645, 0.1 


The remaining probability is at Pr(Uf = —1.05) = 0.1. 

One way to visualize year 2 is with a two-way table providing all of the 
combinations of uj and w;,;. The entries in Table 7.4 are uj+wy,z, 9j,4- Only 
the sums need be presented because they are the only interesting quantities. 

The joint probability for any cell is the product of f; from that row and 
the probability in that cell. The addition to w(2,1) is the probability for all 
the cells with negative entries, that is, 


w(2, 2) = 0.1 + 0.2(0.2) + 0.2(0.1) + 0.3(0.1) = 0.19. 


There are no duplicate values for w;,,, so the best we can do is to put the 
values in order and note that they are the new u; values for the beginning of 
year 3. They are listed in Table 7.5. 


Table 7.5 u for Example 7.12 


j Uj fi 
1 1.645 0.04 
2 1.795 0.06 
3 1.995 0.06 
4 3.295 0.08 
5 3.645 0.08 
6 3.995 0.09 
7 5.495 0.12 
8 5.645 0.12 
9 7.145 0.16 
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Table 7.6 Probabilities for Example 7.13 


j Uj fi 

1 0 0.0134 

2 2 0.189225 
3 4 0.258975 
4 6 0.2568 

5 8 0.0916 


The probabilities total 0.81, the complement of w(2, 2). By the earlier 
definition, the remaining 0.19 probability is associated with Uz < 0. 0 


Tt should be easy to see that the number of possible u values as well as 
the number of decimal places can increase rapidly. At some point, rounding 
would seem to be a good idea. A simple way to do this is to demand that at 
each period the only allowable u values are some multiple of h, a span that 
may need to increase from period to period. When probability is assigned to 
some value that is not a multiple of h, it is distributed to the two nearest 
values in a way that will preserve the mean (spreading to more values could 
preserve higher moments). 


Example 7.13 (Example 7.12 continued) Distribute the probabilities for the 
surplus at the end of year 2 using a span of h = 2. 


The probability of 0.04 at 1.645 must be distributed to the points 0 and 
2. To preserve the mean, 0.355(0.04)/2 = 0.0071 is placed at zero and the 
remaining 0.0329 is placed at 2. The expected value is 0.0071(0) +0.0329(2) = 
0.0658, which matches the original value of 0.04(1.645). The value 0.355 is 
the distance from the point in question (1.645) to the next span point (2), and 
the denominator is the span. The probability is then placed at the previous 
span point. The resulting approximate distribution is given in Table 7.6. O 


7.3.2.2 Evaluation by inversion One of the strengths of the inversion method 
is that the act of computing a convolution is reduced to a few multiplications. 
This is true, provided that the random variables are independent. In this case 
that means-that W; is independent of U;_1. We use a different approach with 
regard to keeping track of ruin (earlier that was accomplished by freezing U; 
upon ruin). This idea could also be applied to the direct convolution approach. 
This time, let Už* be U; conditioned on U; > 0. At the end of each period, all 
probability associated with ruin is redistributed over the outcomes producing 
nonnegative surplus. The year-by-year analysis proceeds as follows: 


1. Determine y; ,(z) =E(e’*¥:-1), the characteristic function of U$. 
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bo 


. Determine p3 ,(z) =E(e’*™), the characteristic function of W;. 
3. Then p3 (2) = 91 4(z)-%o,2(z) is the characteristic function of Uf*,+W,. 
4. Use inversion to determine f;(u), the pf of Us*, + Wa. 


5. Let n= Pr(Uf*, +W: < 0). This is the probability that, given survival 
to time ¢ — 1, the portfolio is ruined at time t. 


6. Then f/*(u) = f:(u)/(1 — re) for u > 0 is the pf of U;*. 


_7. The probability of ruin by time t is then w(u,t) = d(u,t—1) + ra[1 — 
w(u,t = 1)]. 


The process is initiated by noting that the pf of U; can be obtained directly 
by observing that U; = u + W1, so all that needs to be done is to shift the 
arguments of the pf of W1 by u. 


Example 7.14 Aggregate losses for one year are 0, 2, 4, and 6 with proba- 
bilities 0.4, 0.3, 0.2, and 0.1, respectively. Premiums of 2.5 are collected at 
the beginning of the year and initial surplus is 2. Determine the probability of 
ruin within the first two years using the Fast Fourier Transform (FFT). 


The pf of W; is the same in all years and is given in Table 7.7. With an 
initial surplus of 2 it is easy to obtain the distribution of U1. It is given in 
Table 7.8. This immediately gives (2,1) = 0.1 and the distribution of Uj* 
is given in Table 7.9. 


Table 7.7 pf of W; for Example 7.14 


w Pr(W = w) 
—3.5 0.1 
—1.5 0.2 
0.5 0.3 
2.5 0.4 


Table 7.8 pf of U, for Example 7.14 


u Pr(U, = u) 
—1.5 0.1 
0.5 0.2 
2.0 0.3 
4.5 0.4 
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Table 7.9 pf of U[* for Example 7.14 


u Puy = u) 
0.5 2/9 
2.5 3/9 
4.5 4/9 


In order for the FFT to work in a simple manner, it is best to have all 
amounts be positive. This can be accomplished by adding 3.5 to each variable. 
The shifted distributions are given in the second and third columns of Table 
7.10. Anticipating that the shifted Uj* + W2 will take on values from 0 to 14 
with a span of 2, we observe that eight values are required. This is already 
a power of 2, so no extra zeros need be added. In Table 7.10 the fourth and 
fifth columns provide the FFT of the two input variables. They are followed 
by the product of the two characteristic functions and then ultimately by the 
inverse of this characteristic function. The last column is the pf we seek. Of 
course, in this case it would have been trivial to perform the convolutions, 
but this way we can also verify that the FFT and its inverse do what they are 
supposed to. 


Table 7.10 Year 2 ruin calculation for Example 7.14 


u (u) fw (u) 912/8 P2,2/8 
0 0 1/10 0.125 0.125 
2 0 2/10 —0.08502 — 0.057241 —0.00518 — 0.090532 
4 2/9 3/10 0.02778 + 0.041672 —0.025 + 0.0252 
6 3/9 4/10 0.02609 — 0.001697 0.03018 — 0.015532 
8 4/9 0 0.04167 —0.025 
10 0 0 —0.02609 + 0.001692 0.03018 + 0.015532 
12 0 0 0.02778 — 0.041672 —0.025 — 0.252 
14 0 0 —0.08502 + 0.05724 —0.00518 + 0.090532 
u P3,2/ 64 fa(u) 
0 0.01563 0 
2  —0.00474 + 0.007992 0 
4 -—0.00174 — 0.000352 2/90 
6  —0.00081 + 0.000352 7/90 
8 —0.00104 16/90 
10  —0.00081 — 0.000352 25/90 
12 0.00174 + 0.000352 24/90 
14 —0.00474 — 0.007992 16/90 


222 DISCRETE-TIME RUIN MODELS 


Table 7.11 Distribution of surplus after year 2 for Example 7.14 


u Pr(U}* + Wa = u) Pr(U3* = u) 
-3 2/90 0 
-1 7/90 0 

1 16/90 16/81 

3 25/90 25/81 

5 24/90 24/81 

T 16/90 16/81 


We must note that the probabilities in the last column are shifted by 7. 
The actual distribution of the sum is given in Table 7.11. We see that a = $ 
of the probability is associated with negative values and so %(2, 2) =0.1+ 


0.9(0.1) = 0.19. The conditional distribution U3* is also given in Table 7.11.0 


7.3.3 Exercises 


7.1 The total claims paid in a year can be 0, 5, 10, 15, or 20 with probabilities 
0.4, 0.3, 0.15, 0.1, and 0.05, respectively. Annual premiums of 6 are collected 
at the beginning of each year. Interest of 10% is earned on any funds available 
at the beginning of the year, and claims are paid at the end of the year. 


(a) Determine w(2, 3) exactly. 
(b) Determine w(2, 3), but after the premium is added and the interest 
credited, discretize the resulting distribution with a span of 5. 


7.2 Repeat Exercise 7.1 using the FFT and rediscretizing to maintain a span 
of 5. 


Continuous-time 
ruin models 


8.1 INTRODUCTION 


In this chapter we turn to models that examine surplus continuously over time. 
Because these models tend to be difficult to analyze, we begin by restricting 
attention to models in which the number of claims has a Poisson distribution. 
In the discrete-time case we found that answers could be obtained by brute 
force. For the continuous case we find that exact, analytic solutions can be 
obtained for some situations, and that approximations and an upper bound 
can be obtained for many situations. In this section we introduce the Poisson 
process and the continuous-time approach to ruin. 


8.1.1 The Poisson process 


We consider the basic properties of the Poisson process {N;; t > 0} represent- 
ing the number of claims on a portfolio of business. Thus, N; is the number 
of claims in (0,¢]. A formal definition of a Poisson process is now given. 


Definition 8.1 The number-of-claims process {Nz; t > 0} is a Poisson 
process with rate À > 0 if the following three conditions hold: 


1. No =0. 
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2. The process has stationary and independent increments. 


3. The number of claims in an interval of length t is Poisson distributed 
with mean At. That is, for all s,t > 0 we have 


Np—At 
Pr(Ni+s — Ns = n) = a m= Nhi (8.1) 


Stationary increments means that the distribution of the number of 
claims in a fixed interval depends only on the length of the interval and not 
on when the interval occurs so, for example, there is no trend effect. In- 
dependent increments means that the number of claims in an interval is 
statistically independent of the number of claims in any previous interval (not 
overlapping the present interval). Together, stationary and independent 
increments imply that the process can be thought of intuitively as starting 
over at any point in time. Actually, the assumption of stationarity in Condi- 
tion 2 in the definition is not necessary because it is implied by Condition 3, 
but it is stated for clarity. 

An important property of the Poisson process is that the times between 
claims are independent and identically exponentially distributed, each with 
mean 1/à. To see this, let W; be the time between the (j — 1)th and jth 
claims for j = 1,2,..., where W1 is the time of the first claim. Then, 


Pr(W1 > t) = Pr(N; = 0) =e, 


and so W3 is exponential with mean 1/AÀ. Also, 


Pr(Wo > tW =s) = Pr(W1i + Wo > s + t|W = 8) 
= Pr(Niss = 1|N; = 1) 
= Pr(Niss — Ns =0|N, = 1) 
Pr(Ni+s — Ns = 0) 


because the increments are independent. From Condition 3, we then have 
Pr(W > t|W = s) =e". 


Because this is true for all s, Pr(W2 > t) = e7% and Wə is independent of 
W,. Similarly, W3, Wa, W5, . . . are independent and exponentially distributed, 
each with mean 1/A. 

Finally, we remark that, from a fixed point in time tg > 0 the time until the 
next claim occurs is also exponentially distributed with mean 1/A, due to the 
memoryless property of the exponential distribution. That is, if the nth claim 
occurred s time units before to, the probability that the next claim occurs at 
least t time units after to is Pr(Wn41 > t+ s|Wn+1 > s) =e, which is the 
same exponential survival function no matter what s and n happen to be. 
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8.1.2 The continuous-time problem 


The model for claims payments will be the compound Poisson process. A 
formal definition follows. 


Definition 8.2 Let the number of claims process {N:; t > 0} be a Poisson 
process with rate À. Let the individual losses {X1,X2,...} be independent 
and identically distributed positive random variables, independent of Nz, each 
with cumulative distribution function F(x) and mean u < co. Thus X; is the 
amount of the jth loss. Let S; be the total loss in (0,t]. It is given by S+ = 0 
if Nz =0 and S; = pel 1X; if Ni >0. Then, for fixed t, S, has a compound 
Poisson distribution. The process {S;; t > 0} is said to be a compound 
Poisson process. Because {Nz; t > 0} has stationary and independent 
increments, so does {S,;t > 0}. Also, 


E(S:) = E(N:)E(X3) = (At) (u) = Apt. 


We assume that premiums are payable continuously at constant rate c per 
unit time. That is, the total net premium in (0, t] is ct and we ignore interest 
for mathematical simplicity. We further assume that net premiums have a 
positive loading, that is, ct >E(.S;), which implies that c > Ay. Thus let 

c= (1+0), (8.2) 


where 6 > 0 is called the relative security loading or premium loading 
factor. 

For our model, we have now specified the loss and premium processes. The 
surplus process is thus 


U,=u+ct — Si t>0, 


where u = Up is the initial surplus. We say that ruin occurs if U; ever be- 
comes negative, and survival occurs otherwise. Thus, the infinite-time survival 
probability is defined as 


ol(u) = Pr(U; > 0 for all t > 0|Uo = u), 
and the infinite-time ruin probability is 
y(u) = 1— eu). 
Our goal is to analyze ¢(u) and/or Y(u). 


8.2 THE ADJUSTMENT COEFFICIENT AND LUNDBERG’S 
INEQUALITY 


In this section we determine a special quantity and then show that it can be 
used to obtain a bound on the value of y(u). While it is only a bound, it is 
easy to obtain, and as an upper bound it provides a conservative estimate. 
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8.2.1 The adjustment coefficient 


It is difficult to motivate the definition of the adjustment coefficient from. a 
physical standpoint, so we just state it. We adopt the notational convention 
that X is an arbitrary claim size random variable in what follows. 


Definition 8.3 Let t= xr be the smallest positive solution to the equation 
1+ (1+ 8)ut = Mx(d), (8.3) 


where Mx (t) = E(et¥) is the moment generating function of the claim sever- 
ity random variable X. If such a value exists, it is called the adjustment 
coefficient. 


To see that there may be a solution, consider the two lines in the (t,y) 
plane given by yi(t) = 1+ (1+ @)ut and y2(t) = Mx(t) = E(e*). Now, 
yi(t) is a straight line with positive slope (1+ 6)y. The mgf may not exist 
at all or may exist only for some values of t. Assume for this discussion 
that the mgf exists for all nonnegative t. Then yht) = E(Xe*) > 0 and 
yl (t) = E(X7et*) > 0. Because yi(0) = y2(0) = 1, the two curves intersect 
when t = 0. But y}(0) = E(X) = u < (1+ 8)u = y1 (0). Thus, as ¢ increases 
from 0 the curve y2(t) initially falls below yı(t), but because yo(t) > 0 and 
“yf (t) > 0, eventually y2(t) will cross yı (t) at a point x > 0. The point « is 
the adjustment coefficient. 

We remark that there may not be a positive solution to (8.3), for example, 
if the single claim amount distribution has no moment generating function 
(e.g., Pareto, lognormal). 


Example 8.4 (Exponential claim amounts) If X has an exponential distrib- 
ution with mean u, determine the adjustment coefficient. 


We have F(z) =1—e7*/#, x > 0. Then, Mx(¢) = (1- wtp 
Thus, from (8.3), « satisfies 


1+ (1+ 6)un = (1 — pr). (8.4) 


As noted earlier, x = 0 is one solution and the positive solution is £ = 
6/[w(1+6)]. The graph in Figure 8.1 displays plots of the left- and right-hand 
sides of (8.4) for the case 0 = 0.2 and p = 1. They intersect at 0 and at the 
adjustment coefficient, x = 0.2/1.2 = 0.1667. Oo 


Example 8.5 (A gamma distribution) Suppose that the relative security load- 
ing is 0 = 2 and the gamma distribution has a: = 2. To avoid confusion, let B 
be the scale parameter for the gamma distribution. Determine the adjustment 
coefficient. 


The single claim size density is 


f(z) = Bre, x > 0. 
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—— Left side 
m Right side 


Fig. 8.1 Left and right sides of the adjustment coefficient equation. 


For the gamma distribution p = 28 and 


m= | ef{e)de= (1-8), t< 


Tlia 


Then from (8.3) we obtain 
1+ 6K8 = (1—Bx)~*, 
which may be rearranged as 
667 x3 — 118°? + 46% = 0. 
This is easily factored as 
KG(2K8 — 1)(8K6 — 4) = 0. 


The adjustment coefficient is the only root that solves the original equation,’ 
namely « = 1/(28). o 


For general claim amount distributions, it is not possible to explicitly solve 
for x as was done in the above two examples. Normally, one must resort to 
numerical methods, many of which require an initial guess as to the value of 


10Of the two roots, the larger one, 4/(39), is not a legitimate argument for the mgf. The 
mgf exists only for values less than 1/@. When solving such equations, the adjustment 
coefficient will always be the smallest positive solution. 
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«. To find such a value, note that for (8.3) we may write 


1+(1+6@)ux = E(e"*) 
= E(L+4X+44°X?+---) 
> E(1+KX+ 34? X?) 
= 1+ru+ 4K E(X’). 


Then subtraction of 1 + xu from both sides of the inequality and division by 


k results in 
20u 


< EGA ( x?) x 
The right-hand side of (8.5) is usually a satisfactory initial value of s. Other 
inequalities for x are given in the exercises. 


K 


(8.5) 


Example 8.6 The aggregate loss random variable has variance equal to three 
times the mean. Determine a bound on the adjustment coefficient. 


For the compound Poisson distribution, E(.S;) = Aut, Var(S+) = AtE(X?), 
and so E(X?) = 3u. Hence, from (8.5), « < 26/3. o 


Define 
H(t) =1+ (14+ 8)ut — My (t). (8.6) 


Then the adjustment coefficient x > 0 satisfies H(x) = 0. To solve this 
equation, use the Newton—Raphson formula, 


H(#;) 
Kjt = Ky — He," 


where 
H' (t) = (1+ 8)u — Mx(t) 


beginning with an initial value xo. Because H (0) = 0, care must be taken so 
as not to converge to the value 0. 


Example 8.7 Suppose the Poisson parameter is \ = 4 and the premium rate 
is c= 7. Further suppose the individual loss amount distribution is given by 


Pr(X = 1) = 0.6, Pr(X = 2) = 0.4. 
Determine the adjustment coefficient. 


We have l 
u = E(X) = (1)(0.6) + (2)(0.4) = 1.4 
and 
E(X?) = (1)?(0.6) + (2)? (0.4) = 2.2. 
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Then 8 = c(Au)~! — 1 = 7(5.6)~1 — 1 = 0.25. From (8.5), we know that s 
must be less than Kg = 2(0.25)(1.4)/2.2 = 0.3182. Now, 


Mx (t) = 0.6e! + 0.4e”# 
and so from (8.6) 
H(t) = 1 + 1.75t — 0.6e* — 0.4e”*. 


We also have 
Mi, (t) = (1e’)(0.6) + (2e”*)(0.4) 


and so 
H' (t) = 1.75 — 0.6e" — 0.8e”. 


Our initial guess is ko = 0.3182. Then H(so) = —0.02381 and H’(ko) = 
—0.5865. Thus, an updated estimate of « is 
—0.02381 


“1 = 0. — ———_ = 0.2776. 
kı = 0.3182 05865 


Then H(0.2776) = —0.003091, H’(0.2776) = —0.4358, and 


—0.003091 


K2 = 0.2776 — 04358 


= 0.2705. 


Continuing in this fashion, we get kg = 0.2703, ką = 0.2703, and so the 
adjustment coefficient « = 0.2703 to four decimal places of accuracy. E 


. There is another form for the adjustment coefficient equation (8.3) which 
is often useful. In particular, the following is an alternative definition of «s: 


T 8.7 
1+8 [ e"? f.(x) dx (8.7) 
where 

fa Z O ast: (8.8) 


is the equilibrium probability density function discussed in Section 4.3.3. 
To see that (8.7) is equivalent to (8.3), note that 


e My(K)—1 
J e"? fe(lz)dr = uate ine Ma > 


from Exercise 4.15. Thus replacement of Mx(«) by 1+ (1+ @)ux in this 
expression yields (8.7). 
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8.2.2 Lundberg’s inequality 


The first main use of the adjustment coefficient lies in the following result. 


Theorem 8.8 Suppose k > 0 satisfies (8.3). Then the probability of ruin 
(u) satisfies 


plu) <e", urd. (8.9) 


Proof: Let Y„(u) be the probability that ruin occurs on or before the nth 
claim for n = 0,1,2,.... We will prove by induction on n that w,,(u) < e7*™. 
Obviously, Yolu) = 0 < e~**. Now assume that w,,(u) < e~*" and we wish 
to show that w,,,(u) < e~**. Let us consider what happens on the first 
claim. The time until the first claim occurs is exponential with probability 
density function \e~**. If the claim occurs at time t > 0, the surplus available 
to pay the claim at time ¢ is u + ct. Thus, ruin occurs on the first claim if 
the amount of the claim exceeds u + ct. The probability that this happens is 
1 — F(u + ct). If the amount of the claim is x, where 0 < x < u + ct, ruin 
does not occur on the first claim. After payment of the claim, there is still 


a surplus of u + ct — x remaining. Ruin can still occur on the next n claims. | 


Because the surplus process has stationary and independent increments, this 
is the same probability as if we had started at the time of the first claim with 
initial reserve u + ct — x and been ruined in the first n claims. Thus, by the 
law of total probability, we have the recursive equation? 


Papiu) = ss i — F(u + ct) + o w,(u+ ct — 2) ar(a)| dea 


Thus, using the inductive hypothesis, 


Wn4oi(u) = 1 pi dF (x) + A palu + ct — nro) Ae dt 


fet 


oo o0 
f [/ eo h(uret—z) dF (x) 
0 utet 


utet 
+f eer) aP(a)| dew dt, 
0 


IA 


? The Stieltjes integral notation “dF (x) may be viewed as a notational convention to cover 
the situation where X is discrete, continuous, or mixed. In the continuous case, replace 
dF (x) by f(x)dx and proceed in the usual Riemann integral fashion. In the discrete case, 
the integral is simply notation for the usual sum involving the probability mass function. 
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where we have also used the fact that —K(u+ ct — x) > 0 when x > u + ct. 
Combining the two inner integrals gives 


f f en t(utct—2) ar(a)| Newt dt 
0 0 


[>e] oo 
= rev f ents if e"? ar(a)| e~* dt 
0 0 


co 
= a) eT AtK)E [My ()] dt 
0 


IA 


Wnt (u) 


co 
= AMc(wen™ | eT AtKc)t de 
0 


AMx(«) e-u 
A+ Kc i 


But from (8.3) and (8.2) 
AMx (K) =A + (1+ Oey] = At K(1+O)Au = A+ Ke 


and so U,41(u) < e~*". Therefore, u,(u) < e~* for all n and so y(u) = 
img it). =e: Oo 


This result is important because it may be used to examine the interplay 
between the level of surplus u and the premium loading 0, both parameters 
which are under the control of the insurer. Suppose one is willing to tolerate 
a probability of ruin of a (e.g., a = 0.01) and a surplus of u is available. Then 


a loading of 
le] 
ad a Se 2 ee | 


—plna 
ensures that (8.3) is satisfied by « = (—Ina)/u. Then, by Theorem 8.8, 
y(u) < e~** = e2 = a. On the other hand, if a specified loading of @ is 
desired, the surplus u required to ensure a ruin probability of no more than 
a is given by 
-lne 
K 


u = 


because y(u) < e7"! = e™® = a as before. 
Also, (8.9) allows us to show that 


w(co) = lim y(u) =0. (8.10) 
u00 
Because the ruin probability is also nonnegative, we have 


0 < y(u) < e7" (8.11) 
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and thus 
0< lim w(u) < lim e*” =0, 
UF OO u— 00 
which establishes (8.10). We then have that for the survival probability 


(œ) = 1. (8.12) 


8.2.3 Exercises 


8.1 Calculate the adjustment coefficient if 0 = 0.32 and the claim size distri- 
bution is the same as that of Example 8.5. 


8.2 Calculate the adjustment coefficient if the individual loss size density is 
f(x) = VB/(aa)e~"*, x > 0. 


8.3 Calculate the adjustment coefficient if c = 3, A = 4, and the individual 
loss size density is f(z) = e~?* + 3e-%*, x > 0. Do not use an iterative 
numerical procedure. 


8.4 If c = 2.99, \ = 1, and the individual loss size distribution is given by 
Pr(X = 1) = 0.2, Pr(X = 2) = 0.3, and Pr(X = 3) = 0.5, use the Newton- 
.Raphson procedure to numerically obtain the adjustment coefficient. 


8.5 Repeat Exercise 8.3 using the Newton—Raphson procedure beginning with 
an initial estimate based on (8.5). 


8.6 Suppose that E(X?) is known where X is a generic individual loss amount 
random variable. Prove that the adjustment coefficient « satisfies 


as 26(X3) 


Also prove that the right-hand side of this inequality is strictly less than the 
bound given in (8.5), namely 204/E(X?). 


8.7 Recall that, if g'(x) > 0, Jensen’s inequality implies E[g(Y)] > g[E(Y)]. 
Also, from Section 4.3.3, 


= E(X?) 
tfe(z) dx = ———., 
where fe(x) is defined by (8.8). 

(a) Use (8.7) and the above results to show that 


„ < 2eln( +0) 
S- E) 
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(b) Show that In(1 + 6) < @ for @ > 0 and thus the inequality in (a) 
is tighter than that from (8.5). Hint: Consider h(@) = 0 — In(1 + 
0), 0 >0. 


(c) If there is a maximum claim size of m, show that (8.7) becomes 


1+0= f e"? f(x) dz. 
0 


Show that the right-hand side of this equality satisfies 
m 
i e"? felz) dx < e"™ 
0 
and hence that i 
k> —In(1+ 6). 
m 


8.8 In Section 4.3.3 it was shown that, if F(z) has an increasing mean residual 
life [which is implied if F(z) has a decreasing hazard rate], then 


m fely)dy 2 1— F(z), z>0. 


(a) Let Y have probability density function fe(y), y > 0, and let X 
have cumulative distribution function F(a). Show that 


Pr(Y >y) > Pr(X >y), y20 
and hence that 
Pr(e*Y >t) > Pr(e"¥ >t), t21. 


(b) Use (a) to show that E(e"Y) > E(e**). 
(c) Use (b) to show that « < 6/[u(1 + 9)]. 
(d) Prove that, if the above inequality is reversed, 


6 
g > ————, 
“< nL + 8) 


8.9 Suppose that « > 0 satisfies (8.3) and also that 


co 


S(x) < pens | e" dF (y) (8.13) 
for 0 < p < 1, where S(x) = 1 — F(x). Prove that y(u) < pe“*", u > 0. Hint: 
Use the method of Theorem 8.8. 
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8.10 Continue the previous exercise. Use integration by parts to show that 
CO wo 
/ e" dF(y) = e**S(z) + a eyS(y)dy, x >0. 
x T 


8.11 Suppose F(x) has a decreasing hazard rate (Section 4.3.3). Prove that 
S(y) > S(x)S(y— x), x > 0, y > x. Then use Exercise 8.10 to show that 
(8.13) is satisfied with p~' =E(e**). Use (8.3) to conclude that 


(x) < [L+ (1+ 8)au] eT, «D0. 
8.12 Suppose F(x) has a hazard rate u(x) = —(d/dz) ln S(x) which satisfies 
u(x) < m < oo, x > 0. Use the result in Exercise 8.10 to show that (8.13) is 
satisfied with p = 1 — k/m and thus 
y(z) < (1—K/m)e"**, x>0. 


Hint: Show that, for y > x, S(y) > S(x)e~@-9)™, 


8.3 AN INTEGRODIFFERENTIAL EQUATION 


~ We now consider the problem of finding an explicit formula for the ruin prob- 
ability y(u) or (equivalently) the survival probability ¢(u). It will be useful 
in what follows to consider a slightly more general function. 


Definition 8.9 G(u,y) = Pr(ruin occurs with initial reserve u and deficit 
immediately after ruin occurs is at most y), u>0,y > 0. 


For the event described, the surplus immediately after ruin is between 0 
and —y. We then have 


y(u) = jim, G(u,y), u20. (8.14) 
We have the following result. 


Theorem 8.10 The function G(u, y) satisfies the equation 


Eoun) = Zeltu- [otuz,y)ar(@)— Puty) -Pi 


u> 0. (8.15) 


Proof: Let us again consider what happens with the first claim. The time of 
the first claim has the exponential probability density function \e~**, and the 
surplus available to pay the first claim at time t is u+ct. If the amount of the 
claim is x, where 0 < x < u + ct, then the first claim does not cause ruin but 
reduces the surplus to u-+ct—«. By the stationary and independent increments 
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property, ruin with a deficit of at most y would then occur thereafter with 
probability G(u + ct — x,y). The only other possibility for ruin to occur with 
a deficit of at most y is that the first claim does cause ruin, that is, it occurs 
for an amount x where z > u+ct but x < u+ct+y because, if x > ut+ctt+y, 
the deficit would then exceed y. The probability that the claim amount z 
satisfies u + ct < x < u + ct + y is F(u + ct + y) — F(u + ct). Consequently, 
by the law of total probability, we have 


oo utct 
Gua) = [of Cutt- snar) 
0 0 
+F(u+ct+y)—-F(ut 2) de dt. 
We wish to differentiate this expression with respect to u, and in order to do 


so, it is convenient to change the variable of integration from ¢ to z = u + ct. 
Thus, t = (z — u)/c and dt = dz/c. Then with this change of variable we have 


co z 
Cluy) = Aever [7 oa | [1 aa- aaa) + Fle +9) —F@)| ae 
c u 0 
Recall from the fundamental theorem of calculus that, if k is a function, then 


4 JE k(z)dz = —k(u), and we may differentiate with the help of the product 
rule to obtain 


I ag À sQ/eu f- -ov] fa -r ydFle 
FA y) = Glu y) + Me e 3 (u y) ( ) 
+F(u+s)- Fw}, 
from which the result follows. o 


We now determine an explicit formula for G(0, y). 
Theorem 8.11 The function G(0,y) is given by 


ry 
G(0,y) = 2 i terei y>0. (8.16) 
0 
Proof: First note that 
0 < G(u,y) < y(u) < e™™ 


and thus l 
0 < G(co,y) = lim G(u,y) < lim e™" = 0, 
u00 U—-00 


and therefore G(co, y) = 0. Also, 


co co 
th G(u,y) du < f e™™ du = K! < œ. 
0 0 
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Thus let 7(y) = i G(u,y)du and we know that 0 < r(y) < co. Then, 
integrate (8.15) with respect to u from 0 to oo to get, using the above, 

A 


c 


—G(0,y) = Àr) m T G(u—z, y) dF (x) du- f [F(u+y)—F(u)] du. 


0 
Interchanging the order of integration in the double integral yields 
A À [oe] Cc 
G(0,y) = -770) + =f l G(u — x,y) du dF (x) 
s Z 0 T 
+7 l [F(u + y) — F(u)] du 
0 


and changing the variable of integration from u to v = u — q in the inner 
integral of the double integral results in 


G(0,y) 


Arq) +2 f > l Goade 


+2 f Puta) — F(u)| du 


A TE a 
=77) + F f r(y)dF(x)+ à | [F(u+y) — F(u)| du. 


Because dee dF (x) = 1, the first two terms on the right-hand side cancel, and 
so 


G(0, y) 


x oO 
= [ret -Fuldu 


À ow [oe] 
= =} n-ro- f [1 — F(u + y)] du. 


Then change the variable from u to x = u in the first integral and from u to 
z =u + y in the second integral. The result is 


cou) =4 [a-F@lde-2 ["o-F@ae=3 f i- Fod 
y 


We remark that (8.16) holds even if there is no adjustment coefficient. The 
function G(0, y) is itself of considerable interest in its own right, but for now 
we shall return to the analysis of (wu). 


Theorem 8.12 The survival probability with no initial reserve satisfies 


o(0) = Te (8.17) 
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Proof: Recall that u = fy [L — F(«)] dz and note that from (8.16) 


WO = Jim G09) =4 ["a-F@ar=*= Ty 
Thus, (0) = 1—v(0) = 4/(1 + 9). m 


The general solution to ¢(u) may be obtained from the following integro- 
differential equation subject to the initial condition (8.17). 


Theorem 8.13 The probability of ultimate survival $(u) satisfies 
‘i À à f” 
ọ (u) = zo) ae o(u—a)dF(z), u20. (8.18) 
0 


Proof: From (8.15) with y — oo and (8.14), 


S À 
vw) =u 4 f vu-a)ar@)-20- Few, u20. (819) 
0 
In terms of the survival probability 6(u) = 1 — y(u), (8.19) may be expressed 
as 


-su = a-g- fE- 6u—ayar@ -Zi -Fo 
= Žž) -7 f area) +2 f * 6(u— 2) dF(2) + 2F(u) 
= ~26 +2 | ou-2)ar(o) 
because F(u) = fj dF(x). The result then follows. o 


It is largely a matter of taste whether one uses (8.18) or (8.19). We shall 
often use (8.18) because it is slightly simpler algebraically. Unfortunately, the 
solution for general F(x) is rather complicated and we shall defer this general 
solution to Section 8.4. At this point we shall obtain the solution for one 
special choice of F(z). 


Example 8.14 (The exponential distribution) Suppose, as in Example 8.4, 
that F(x) = 1—e7*/#, x >0. Determine ¢(u). 


In this case (8.18) becomes 
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Change variables in the integral from x to y = u — x to obtain 


1 À = 4 
ġ (u) = zolu) = 2e w oly)e”/! dy. (8.20) 


We wish to eliminate the integral term in (8.20) so we differentiate with respect 
to u. This gives 


n À 1 A —U F 4 
atu) = Že tiee f oel dy- Zot). 


The integral term can be eliminated using (8.20) to produce 
n = Ay A 1jA, 1 
da= io- Solu) += |Fou) - 8), 


which simplifies to 


After multiplication by the integrating factor e4/[HA+9)] this may be rewrit- 
ten as d 


— | eft/[eG+8)] gf = 
du |e ? w] 9: 
Integrating with respect to u gives 

ee W+G (u) = Ky. 
From (8.20) with u = 0 and using (8.17), we thus have 


à 8 À 0 8 


ET So = ae ET 
va) c1+O Aw(1+O)1+6 = p(1+6)?” 


Thus, 
du 


Ge = 8 
a= r| meat 
which may be integrated again to give 
| 1 du 
olu) = ~T76 Breen) + Ko. 


Now (8.17) gives ¢(0) = 0/(1 + 8), and so with u = 0 we have Kə = 1. Thus 


M=- 


is the required probability. o 
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8.3.1 Exercises 


8.13 Suppose that the claim size distribution is exponential with F(z) = 
1—e7*/# as in Example 8.14. 


(a) Prove, using (8.15), that G(u, y) = y(u)F (y) in this case. 
(b) Prove that the distribution of the deficit immediately after ruin 


occurs, given that ruin does occur, has the same exponential dis- 
tribution given above. 


8.14 This exercise involves the derivation of integral equations called defec- 
tive renewal equations for G(u,y) and y(u). These may be used to derive 
various properties of these functions. 


(a) Integrate (8.15) over u from 0 to t and use (8.16) to show that 


Gey) = ŽA) -2 f A(t —2,y)dF (2) 


4 fe — F(x)]dz — à f [1 — F(u)|du 
+2 [Peet a) 


where A(z,y) = fg Glu, y)dv. 


(b) Use integration by parts on the integral i A(t— zx, y)dF (x) to show 
from (a) that 


Ota) = Anew -È f ee—su)P ear 
+Ž I “roroi à | ‘U — F(u)ldu. 
(c) Prove using (b) that 
Glu) = 2 J "G(u—2,y)[L— F(@)|de + Ž i "1 — F(a)|de. 
(a) Prove that 


w(u) = à R (u — x)[1 — F'(x)]dz + à fo — F(2)|dz. 


8.4 THE MAXIMUM AGGREGATE LOSS 


We now derive the general solution to the integrodifferential equation (8.18) 
subject to the boundary conditions (8.12) and (8.17). 
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Beginning with an initial reserve u, the probability that the surplus will 
ever fall below the initial level u is 7(0) because the surplus process has 
stationary and independent increments. Thus the probability of dropping 
below the initial level u is the same for all u, but we know that when u = 0 
it is w(0). 

The key result is that, given that there is a drop below the initial level u, 
the random variable Y which represents the amount of this initial drop has 
the equilibrium probability density function fe(y), where fe(y) is given by 
(8.8). 


Theorem 8.15 Given that there is a drop below the initial levelu, the random 
variable Y which represents the amount of this initial drop has probability 
density function fely) = [L — F(y)]/z- 


Proof: Recall the function G(u, y) from Definition 8.9. Because the surplus 
process has stationary and independent increments, G(0,y) also represents 
the probability that the surplus drops below its initial level, and the amount 
of this drop is at most y. Thus, using Theorem 8.11, the amount of the drop, 
given that there is a drop, has cumulative distribution function 


GO, y) 
Pr(Y < = 
Ys) = GO) 

ay f -roa 

= = u u 
cY(0) Jo 
fu- rwd 

= = — F(u)| du 
H Jo 

and the result follows by differentiation. o 


If there is a drop of y, the surplus immediately after the drop is u — y, and 
because the surplus process has stationary and independent increments, ruin 
occurs thereafter with probability Y(u — y), provided u — y is nonnegative; 
otherwise ruin would have already occurred. The probability of a second drop 
is y(0), and the amount of the second drop also has density fe(y) and is 
independent of the first drop. Due to the memoryless property of the Poisson 
process, the process “starts over” after each drop. Therefore, the total number 
of drops K is geometrically distributed, that is, Pr(K = 0) = 1—¥(0), 
Pr(K = 1) = [1 — W(0)}#(0), and more generally, 


i 
Pr(K = k) = [1 — ¥(0)][b(0)]* = 5 (3) , k=0,1,2,..., 


because (0) = 1/(1 +0). The usual geometric parameter £ (in Appendix B) 
is thus 1/0 in this case. 

After a drop, the surplus immediately begins to increase again. Thus, the 
lowest level of the surplus is u— L, where L, called the maximum aggregate 
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loss, is the total of all the drop amounts. Let Y; be the amount of the jth drop, 
and because the surplus process has stationary and independent increments, 
{Y1, Yo,...} is a sequence of independent and identically distributed random 
variables [each with density f.(y)]. Because the number of drops is K, it 
follows that 

B=¥,+Yot°-+Y¥x 


with L =0if K =0. Thus, L is a compound geometric random variable with 
“claim size density” fely). 

Clearly, ultimate survival beginning with initial reserve u occurs if the 
maximum aggregate loss L does not exceed u, that is, 


@(u)=Pr(L<u), u20. 


Let F}? (y) = 0 if y < 0 and 1 if y > 0. Also F2: (y) = Pr{¥it+Yot---+¥, < 
y} is the cumulative distribution function of the k-fold convolution of the 
distribution of Y with itself. We then have the general solution, namely, 


sE (4) rey, u20 
ere a. 1+8 oe i 


In terms of the ruin probability, this general solution may be expressed as 


28 ee 
ve) =) ira ea oe 20 


k=1 


where S**(y) = 1— Fž* (y). Evidently, %(u) is the survival function associated 
with the compound geometric random variable L, and analytic solutions may 
be obtained in a similar manner as to those obtained in Section 6.4. An 
analytic solution for the important Erlang mixture claim severity pdf? 


BoB gk-le-2/8 


f(x) = 2g a 


where the gq, are positive weights that sum to 1, may be found in Exercise 
8.17, and for some other claim severity distributions in the next section. 

We may also compute ruin probabilities numerically by computing the 
cumulative distribution function of a compound geometric distribution using 
any of the techniques described in Chapter 6. 


Example 8.16 Suppose the individual loss distribution is Pareto with a = 3 
and a mean of 500. Let the security loading be 8 = 0.2. Determine 9(u) for 
u = 100, 200, 300,.... 


3 Any continuous positive probability density function may be approximated arbitrarily ac- 
curately by an Erlang mixture pdf, as shown by Tijms [129], p. 163. An Erlang distribution 
is a gamma distribution for which the shape parameter a must be an integer. 
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Table 8.1 Survival probabilites, Pareto losses 


u ọ(u) u ọlu) 
100 0.193 5,000 0.687 
200 0.216 7,500 0.787 
300 0.238 10,000 0.852 
500 0.276 15,000 0.923 
1,000 0.355 20,000 0.958 
2,000 0.473 25,000 0.975 
3,000 0.561 


We first require the cdf, Fe(u). It can be found from its pdf 


inal es ( 1,000 ) 
1— F(u) 1,000 + u 
eo 500 
~ 1 /_ 1000 \° 
~ 500 \1,000+u/ ’ 

which happens to be the density function of a Pareto distribution with a = 2 
and a mean of 1,000. This new Pareto distribution is the severity distribution 
for a compound geometric distribution where the parameter is 8 = 1/0 = 
5. The compound geometric distribution can be evaluated using any of the 
techniques in Chapter 6. We used the recursive formula with a discretization 
which preserves the mean and a span of h = 5. The cumulative probabilities 


are then obtained by summing the discrete probabilities generated by the 
recursive formula. The values appear in Table 8.1. O 


felu) 


8.4.1 Exercises 


8.15 Suppose the number of claims follows the Poisson process and the amount 
of an individual claim is exponentially distributed with mean 100. The rela- 
tive security loading is @ = 0.1. Determine %(1,000) by using the method of 
this section. Use the method of rounding with a span of 50 to discretize the 
exponential distribution. Compare your answer to the exact ruin probability 
(see Example 8.14). 


8.16 Consider the problem of Example 8.5 with 8 = 50. Use the method of 
this section (with discretization by rounding and a span of 1) to approximate 
(200). Compare your answer to the exact ruin probability which may be 
found in Example 8.19. 
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8.17 Suppose that the claim severity pdf is given by 


f(z) = du ETa , £>Q0, 


k=1 


where S-19: = 1. Note that this is a mixture of gamma densities. 


(a) Show that 


a k—1 e7 2/8 
fe(z) = 3 a > T> 0, 
where = 
à T pei 
wS a 
and also show that J`% = 1- 
(b) Define 


Q*(2) = Fg 
k=1 


and use the results of Exercise 6.35 to show that 
u j e7u/B 
plu) = I ae a , u20, 


where ct 
ow) = f1- 51a") - 1} 


is a compound geometric pgf, with probabilities which may be com- 
puted recursively by co = 0(1+0)~* and 


k 
1 
ck =z Y G cr-i» k=1,2,..., 


where gj = 0 UJEL orat 
(c) Use (b) to show that 


w(u) _ ewe 6, CR, u > 0, 
j=0 ` 


where Cj = a Ck, j = 0,1,.... Then use (b) to show that 
the C,s may be computed recursively from 
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1 m ne 1 oo 
C= TG? ECan- + 7G X G, n=1,2,...; 
=l k=n+1 


beginning with Co = (1+ 6)7!. 
8.18 (a) Using Exercise 8.14(c) prove that 


1 u yru 
Olun) =z | uae yty | flade 


eg G(u, y) is defined in Section 8.3, and using Exercise 8.14(d) 
that 


1 u co 
v= zg |, Ve-la + EZO 


where fe(x) is given by (8.8). 
(b) Prove the results in (a) directly by using probabilistic arguments. 


Hint: Condition on the amount of the first drop in surplus and use 
the law of total probability. 


8.5 CRAMER'S ASYMPTOTIC RUIN FORMULA AND TIJMS’ 
APPROXIMATION 


There is another very useful piece of information regarding the ruin probability 
which involves the adjustment coefficient x. The following theorem gives a 
result known as Cramér’s asymptotic ruin formula. The notation a(x) ~ b(z), 
x — co, means lim, ..9a(x)/b(x) = 1. 
aia 8.17 Suppose K > 0 satisfies (8.3). Then the ruin probability sat- 
isfies 

y(u) ~ Ce", u-o, (8.21) 
where 

C= — P = 

Mi (m) — mL + 8)’ 

and Mx(t) =E(e*) = f e'dF (a) is the moment generating function of 
the claim severity random variable X. 


(8.22) 


Proof: The proof of this result is complicated and utilizes the key renewal 
theorem together with a defective renewal equation for y(u) which may be 
found in Exercise 8.14(d), or equivalently in Exercise 8.18(a). The interested 
reader should see Rolski et. al. [113], Sec. 5.4.2 for details. (E 


Thus, in addition to Lundberg’s inequality given by Theorem 8.8, the ruin 
probability behaves like an exponential function for large u. Note that, for 
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Lundberg’s inequality (8.9) to hold, it must be the case that C given by (8.22) 
must satisfy C < 1. Also, although (8.21) is an asymptotic approximation, it 
is known to be quite accurate even for u which is not too large (particularly 
if the relative security loading @ is itself not too large). Before continuing, let 
us consider an important special case. 


Example 8.18 (The exponential distribution) If F(z) = 1 — e/#, x > 0, 
determine the asymptotic ruin formula. 


We found in Example 8.4 that the adjustment coefficient was given by 
«= 6/[u(1+9)] and Mx(t) = (1 — wt)~*. Thus, 


d el pt)? = p(l — wt). 


M(t) = T 


Also, 
Mi(«) = p(l — ws)? = p[l — 0(1 +0)? = n+ 9). 
Thus, from (8.22), 


c u8 0 o 1 
~ aate- p+) +01+0-1) 1+8 


The asymptotic formula (8.21) becomes 


b(u) a e -a | u — œ 
Y 14a P| LaO) i 


This is the exact ruin probability as was demonstrated in Example 8.14. 0 


In cases other than when F(z) is the exponential distribution, the exact 
solution for (u) is more complicated (including in particular the general 
compound geometric solution given in Section 8.4). A simple analytic ap- 
proximation was suggested by Tijms [129], pp. 271-272 to take advantage of 
the accuracy for large u of Cramér’s asymptotic ruin formula given in The- 
orem 8.17. The idea is to add an exponential term to (8.21) to improve the 
accuracy for small u as well. Thus, the Tijms approximation is defined as 


prlu) = (a — c) ewe Ge, u20, (8.23) 
where a is chosen so that the approximation also matches the compound 
geometric mean of the maximum aggregate loss. As shown in Section 4.3.3, 
the mean of the amount of the drop in surplus (in the terminology of Section 
8.4) is E(Y) = Jy yfe(y)dy = E(X*)/(2u), where p = E(X) and X is a 
generic claim severity random variable. Similarly, the number of drops in 
surplus K is geometrically distributed with parameter 1/8, so from Appendix 
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B we have E(K) = 1/0. Because the maximum aggregate loss is the compound 
geometric random variable L, it follows from (6.6) that its mean is 


E(X?) 


B(L) = E(K)E(Y) = y 


But y(u) = Pr(L > u), and from (3.9) on page 32 with k = 1 and u = co we 
have E(L) = i w(u)du. Therefore, in order that the Tijms approximation 
match the mean, we need to replace w(u) by Yr(u) in the integral. Thus from 
(8.23) 


J ha Yr(u)du 


and equating this to E(L) yields 


1 
a(zi-0)+ 


which may be solved for a to give 


aIQ 
B 
> 
“el 


BK?) /(2ud) - Cf 

a= +20 (8.24) 
To summarize, Tijms’ approximation to the ruin probability is given by (8.23), 
with a given by (8.24). 

In addition to providing a simple analytic approximation which is of good 
quality, Tijms’ approximation %p(u) has the added benefit of exactly repro- 
ducing the true value of y(u) in some cases. (Some insight into this phe- 
nomenon is provided by Exercise 8.21.) In particular, it can be shown that 
wrp(u) = y(u) if the claim size pdf is of the form f(x) = p(@~te~*/) + (1 — 
p)(B-*ae*/), x > 0, withO<p<1 (of course, if p = 1, this is the ex- 
ponential density for which Tijms’ approximation is not used). We have the 
following example. 


Example 8.19 (A gamma distribution with a shape parameter’ of 2) As 
in Example 8.5, suppose that 0 = 2, and the single claim size density is 
f(z) = B-?xe-*/®, x > 0. Determine the Tijms approximation to the ruin 
probability. 


The moment generating function is Mx(t) = (1 — 6t)~?, t < 1/6, from 
which one finds that M% (t) = 2G(1— 6t)~*, and u = M4 (0) = 28. As shown 


*For a gamma distribution, the shape parameter is the one denoted by a in Appendix A 
and is not to be confused with the value of a in the Tijms approximation. 
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in Example 8.5, the adjustment coefficient x > 0 satisfies 1+ (1 + @«p = 
Mx(«), which in this example becomes 1 + 6G = (1 — Bx)~? and is given 
by x = 1/(2@). We will first compute Cramér’s asymptotic ruin formula. We 
have M(x) = M4 [1/(28)] = 28(1 — 4)~3 = 168. Thus, (8.22) yields 


|, eC N 

166 —(28)(1+2) 5’ 
and from (8.21), y(u) ~ 2e7%/@9),u — oo. We next turn to Tijms’ approxi- 
mation given by (8.23), which becomes in this case 

1 2 2 2 1 

p(y) = (——— 2) ecwe 4 Ze7u/(28) — 2.-u(2e) _ Levula, 

Yr(u) es zje z 5° 15° 
It remains to compute a. We have M% (t) = 66°(1 — Bt)~*, from which it 
follows that E(X?) = M%(0) = 66°. The amount of the drop in surplus 
has mean E(Y) =E(X?)/(2u) = 66°/(48) = 38/2. Because the number of 
drops has mean E(K) = 1/0 = 5, the maximum aggregate loss has mean 
E(L) = E(K)E(Y) = 36/4, and a must satisfy E(L) = fy bp(u)du, or 
equivalently (8.24). That is, a is given by? 


Tijms’ approximation thus becomes 


wp(u) = Ze 0/8) = gee 44/38), u>0. 


-As mentioned above, (u) = w-(u) in this case. o 


Another class of claim severity distributions for which the Tijms approxi- 
mation is exactly equal to the true ruin probability is that, with probability 


density function of the form f(z) = p(B,e~*) + (1 — p)(b2e7®?®), £ > 0., 


If 0 < p <1, then this distribution is a mixture of two exponentials, whereas 
if, p = B./(G_ — Bı) then the distribution, referred to as a combination of 
two exponentials, is that of the sum of two independent exponential random 
variables with means 3, and G5, where 6, Æ Ba. The next example illustrates 
these ideas. 


5It is actually not a coincidence that 1/a is the other root of the adjustment coefficient 
equation, as may be seen from Example 8.5. It is instructive to compute & in this manner, 
however, because this approach is applicable in general for arbitrary claim size distributions, 
including those in which Tijms’ approximation does not exactly reproduce the true ruin 
probability. 
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Example 8.20 (A mixture of exponential distributions) Suppose that 0 = 4 
and the single claim size density is f(x) = e~8*+10e~°*/3, x > 0. Determine 
the Tijms approximation to the ruin probability. ` 


First we note that the moment generating function is 
[ee] 
Mx(t) = a e7 f(x)de = (8—4) + 866-4). 
0 


Thus, M% (t) = (3—t)~?+42(5—t)~°, from which it follows that y = M5 (0) = 
3 +42 = H., Equation (8.3) then implies that the adjustment coefficient « > 0 
satisfies 1+ $x = (3—«)~1+ 42(5—«)~1. Multiplication by 3(3 — «)(5 — «) 
yields 
3(« — 3)(x — 5) + k(s — 3)(K — 5) = 3(5 — s) + 1018 — &). 
That is, 
3(K? — 84 + 15) + K? — 8x? + 15K = 45 — 13K. 


Rearrangement yields 
0 = K? — 5K? +46 = w(K — 1)(K — 4), 


and s = 1 because it is the smallest positive root. 
Next, we determine Cramér’s asymptotic formula. Equation (8.22) be- 
11 


comes, with M% (x) = M4 (1) = 4+ #4 =H, 
E L __® 
z- (8) (5) 8 


and thus Cramér’s asymptotic formula is y(u) ~ Ret, u — OO. 
Equation (8.23) then becomes 


In order to compute a, we note that M% (t) = 2(3 — t)? + B(5 — t)-3, 
and thus E(X?) = M4 (0) = = + 2 (755) = &. The mean of the maximum 
aggregate loss is therefore 
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and so Tijms’ approximation becomes pp(u) = e74" +32e7". As mentioned 


above, Y(u) = w(u) in this case also. o 


It is not hard to see from (8.23) that Yp(u) ~ Ce™™", u— œ, if x < 1/a. 
In this situation, Yp(u) will equal y(u) when u = 0 and when u — oo as well 
as matching the compound geometric mean. It can be shown that a sufficient 
condition for the asymptotic agreement between Yp(u) and y(u) to hold as 
u — oo is that the nonexponential claim size cumulative distribution function 
F(a) has either a nondecreasing or nonincreasing mean residual life function 
[which is implied if F(x) has a nonincreasing or nondecreasing hazard rate, as 
discussed in Section 4.3.3]. It is also interesting to note that w(x) > Ce“™ 
in the former case and w(x) < Ce~** in the latter case. See Willmot [137] 
for proofs of these facts. 

The following example illustrates the accuracy of Cramér’s asymptotic for- 
mula and Tijms’ approximation. 


Example 8.21 (A gamma distribution with a shape parameter of 3) Suppose 
the claim severity distribution is a gamma distribution with a mean of 1 and 
density given by f(x) = 27x7e~**/2, x > 0. Determine the exact ruin proba- 
bility, Cramér’s asymptotic ruin formula, and Tijms’ approximation when the 
relative security loading 0 in each is 0.25,1, and 4, and the initial surplus u 
is 0.10, 0.25, 0.50, 0.75, and 1. 


The moment generating function is Mx(t) = (1 —t/3)®. 

The exact values of y(u) may be obtained using the algorithm presented in 
Exercise 8.17. That is, y(u) = e~% 032.9 C;(3u)?/7!, u 2 0, where the Cys 
may be computed recursively using ‘ 


with Co = 1/(1 + 8), gf =G@=G= h, and gj = 0 otherwise. The required 
values are listed in Table 8.2 under the heading titled Exact. 

Cramér’s asymptotic ruin probabilities are given by the approximation 
(8.21), with « obtained from (8.3) numerically for each value of 6 using the 
Newton-Raphson approach described in Section 8.2.1. The coefficient C is 
then obtained from (8.22). The required values are listed in Table 8.2 under 
the heading titled Cramér. 

Tijms’ approximation is obtained using (8.23) with a satisfying (8.24), and 
the values are listed in Table 8.2 under the heading titled ‘Tijms.’ 

The values in the table, which may also be found in Tijms [129], p. 272 
and Willmot [137], demonstrate that Tijms’ approximation is an accurate 
approximation to the true value in this situation, particularly for small 8. 
Cramér’s asymptotic formula is also remarkably accurate for small @ and u. 
Because this gamma distribution has an increasing hazard rate (as discussed 
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Table 8.2 Ruin probabilities with gamma losses 


0 u Exact Cramér Tijms 
0.25 0.10 0.7834 0.8076 0.7844 
0.25 0.7562 0.7708 0.7571 
0.50 0.7074 0.7131 0.7074 
0.75 0.6577 0.6597 0.6573 
1.00 0.6097 0.6103 0.6093 
1.00 0.10 0.4744 0.5332 0.4764 
0.25 0.4342 0.4700 0.4361 
0.50 0.3664 0.3809 0.3665 
0.75 0.3033 0.3088 0.3026 
1.00 0.2484 0.2502 0.2476 
4.00 0.10 0.1839 0.2654 0.1859 
0.25 0.1594 0.2106 0.1615 
0.50 0.1209 0.1432 0.1212 
0.75 0.0882 0.0974 0.0875 
1.00 0.0626 0.0663 0.0618 


in Example 4.17), Tijms’ approximate ruin probabilities are guaranteed to be 
smaller than Cramér’s asymptotic ruin probabilities, and this may be seen 
to be true from the table. It also follows that the exact values, Cramér’s 
asymptotic values, and Tijms’ approximate values all must converge as u — 
co, but the agreement can be seen to be fairly close even for u = 1. O 


8.5.1 Exercises 


8.19 Show that (8.22) may be reexpressed as 


0 
oe KE(YeY)’ 


where Y has pdf f.(y). Hence prove for the problem of Exercise 8.17 that 
6 


lu) ~ ara an? U oo, 
Kp} j=j (1 pr) 


where & > 0 satisfies 


1+0 = Q- pK) = So af (1 — Ba). 
j=l 
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8.20 Recall the function G(u, y) defined in Section 8.3. It can be shown using 
the result of Exercise 8.14(c) that Cramér’s asymptotic ruin formula may be 
generalized to 
G(u, y) Sad Cliyje™, U—> OO, 
where ae 
a ur fy e i ¥ f.(x)dz dt 
) = Ne) F) 
(a) Demonstrate that Cramér’s asymptotic ruin formula is recovered 
as y — 00. 
(b) Demonstrate using Exercise 8.13 that the above asymptotic for- 
mula for G(u, y) is an equality for all u in the exponential claims 
case with F(x) = 1—e7*/#. 


8.21 Suppose that the following formula for the ruin probability is known to 
hold: 
w(u) = Ce" + Coe ™, u20, 


where C, Æ 0, C2 40, and (without loss of generality) 0 < r1 < T2. 


(a) Determine the relative security loading 8. 

(b) Determine the adjustment coefficient x. 

(c) Prove that 0 < C1 <1. 

(d) Determine Cramér’s asymptotic ruin formula. 

(e) Prove that yr(u) = y(u), where (wu) is: Tijms’ approximation to 

the ruin probability. 

8.22 Suppose that 9 = = and the claim size density is given by f(z) = 
(1-+6z)e"5*, x >O. 

(a) Determine Cramér’s asymptotic ruin formula. 

(b) Determine the ruin probability y(u). 


8.23 Suppose that 0 = a and the claim size density is given by f(z) 
Qe-4#*# + fe", a > 0. 

(a) Determine Cramér’s asymptotic ruin formula. 

(b) Determine the ruin probability y(u). 
8.24 Suppose that 0 = 2 and the claim size density is given by f(x) 
3e74? + fe", c>0. 

(a) Determine Cramér’s asymptotic ruin formula. 

(b) Determine the ruin probability (wu). 


i 


8.25 Suppose that @ = n and the claim size density is the convolution of two 
exponential distributions given by f(x) = i 3e 3-20-24 dy, x > 0. 


(a) Determine Cramér’s asymptotic ruin formula. 
(b) Determine the ruin probability y(u). 
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Fig. 8.2 Sample path for a Poisson surplus process. 


8.6 THE BROWNIAN MOTION RISK PROCESS 


In this section, we study the relationship between Brownian motion (the 
Wiener process) and the surplus process {U;; t > 0}, where 


U;=u+ct— S, t>0 (8.25) 
and {5;; t > 0} is the total loss process defined by 
St = Xi + Xo+---+Xn,, $20, 


where {N;; t > 0} is a Poisson process with rate \ and S; = 0 when N; = 0. 
As earlier in this chapter we assume that the individual losses {X1, X2,...} 
are independently and identically distributed positive random variables whose 
moment generating function exists. The surplus process {U;; ¢ > 0} increases 
continuously with slope c, the premium rate per unit time, and has succes- 
sive downward jumps of {X),X2,...} at random jump times {T}, T2,- --}, as 
illustrated by Figure 8.2. In that figure, u = 20, c = 35, À = 3, and X has an 
exponential distribution with mean 10. 
Let 
AZ =U,-u=ct — Si, t>0. (8.26) 


Then Zo = 0. Because S; has a compound distribution, the process {Z;; t > 
0} has mean 
E(Z:) = ca— E(S;) 
ct — AtE(X) 


and variance 
Var(Z:) = AE(X?). 
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We now introduce the corresponding stochastic process based on Brownian 
motion. 


Definition 8.22 A continuous-time stochastic process {W;; t > 0} is a Brown- 


ian motion process if 
1. Wo =0; 
2. {W,, t > 0} has stationary and independent increments; and 


8. for every t > 0, W; is normally distributed with mean 0 and variance 
ot. 


The Brownian motion process, also called the Wiener process or white 
noise, has been used extensively in describing various physical phenomena. 
When o? = 1, it is called standard Brownian motion. The English 
botanist Robert Brown discovered the process in 1827 and used it to describe 
the continuous irregular motion of a particle immersed in a liquid or gas. In 
1905 Albert Einstein explained this motion by postulating perpetual collision 
of the particle with the surrounding medium. Norbert Wiener provided the 
analytical description of the process in a series of papers beginning in 1918. 
Since then it has been used in many areas of application from quantum me- 
chanics to describing price levels on the stock market. It has become the key 
model underpinning modern financial theory. 


Definition 8.23 A continuous-time stochastic process {W:; t > 0} is called 
a Brownian motion with drift process if it satisfies the properties of a 
Brownian motion process except that W; has mean pt rather than 0, for some 
>o. 


A Brownian motion with drift is illustrated in Figure 8.3. This process has 
u = 20, u = 5, and o? = 600. The illustrated process has an initial surplus of 
20, so the mean of W; is 20 + 5t. Technically, W; — 20 is a Brownian motion 
with drift process. 

We now show how the surplus process (8.26) based on the compound Pois- 
son risk process is related to the Brownian motion with drift process. We 
will take a limit of the process (8.26) as the expected number of downward 
jumps becomes large and simultaneously the size of the jumps becomes small. 
Because the Brownian motion with drift process is characterized by the infini- 
tesimal mean yp and infinitesimal variance g?, we force the mean and variance 
functions to be the same for the two processes. In this way, the Brownian 
motion with drift can be thought of as an approximation to the compound 
Poisson—based surplus process. Similarly, the compound Poisson process can 
be used as an approximation for Brownian motion. 

Let 

p=c— AEX] 
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Fig. 8.3 Sample path for a Brownian motion with drift. 


and 
a? = AE[X?| 


denote the infinitesimal mean and variance of the Brownian motion with drift 
process. Then 


à= EA (8.27) 
and 
c=u+0 A : (8.28) 


Now, in order to take limits, we can treat the jump size X as a scaled version 
of some other random variable Y, so that X = aY, where Y has fixed mean 
and variance. Then 


eee 
= ir] 2 
and 
-upo EY]. 1 
SoS BY] œ 


Then, in order for A — ov, we let a — 0. 

Because the process {5;; t > 0} is a continuous-time process with sta- 
tionary and independent increments, so are the processes {U;; t > 0} and 
{Z,; t > 0}. This will then also be the case for the limiting process. Because 
Zo = 0, we only need to establish that for every t, in the limit, Z+ is normally 
distributed with mean pt and variance o7t according to Definitions 8.22 and 
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8.23. We do this by looking at the moment generating function of Z+. 
M, ct— S: (r) 


= Efexp[r(ct — 5;)]} 
exp(t{re + A[Mx(-r) — 1]}). 


Mz, (r) 


Then 
ln M. Zi (r) 


; = re+A{Mx(-r)-]] 


= rlu+AE(X)| 
2 3 
+À i —rE(X)+ FEX?) — TEZ’) +e i 


r2 r r3 r 
= Tut zE?) — À FEO) — TE) +e | 
T? a 2 r? 3 or 4 
= TH+ gI — ra: eger —a Tey) | 


2 3 R(y3 4 4 
rut Do? o? [at EY”) or EY") | 


EYA | UEY?) 


Because all terms except a are fixed, as a — 0, we have 


2 
lim Mz, (r) = exp (rut + Tot) ; 
a-+0 2 
which is the mgf of the normal distribution with mean pt and o°t. This 
establishes that the limiting process is Brownian motion with drift. 

From Figure 8.2, it is clear that the process U; is differentiable everywhere 
except at jump points. As the number of jump points increases indefinitely, 
the process becomes nowhere differentiable. Another property of a Brown- 
ian motion process is that its paths are continuous functions of t with 
probability 1. Intuitively, this occurs because the jump sizes become small 
asa— 0. 

Finally, the total distance traveled in (0, t] by the process U; is 


D = c+ 
= t+ t Xp, 


which has expected value 


E[D] = ct+ME[X] 
p Ei BON 
= tee EYJa l arya | 


= t|u+20 


2 B(Y) 
E(¥?) a 


=|: 
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This quantity becomes indefinitely large as a — 0. Hence, we have 
lim E[D] = 00. 


This means that the expected distance traveled in a finite time interval is infi- 
nitely large! For a more rigorous discussion of the properties of the Brownian 
motion process, the text by Karlin and Taylor [71], Ch. 7 is recommended. 

Because Z; = U; — u, we can just add u to the Brownian motion with drift 
process and then use (8.27) and (8.28) to develop an approximation for the 
process (8.26). Of course, the larger the value of À and the smaller the jumps, 
the better will be the approximation. For a very large block of insurance 
policies (for example, for an entire company), this may be appropriate. In 
this case, the probability of ultimate ruin and the distribution of time until 
ruin are easily obtained from the approximating Brownian motion with drift 
process. This is done in the next section. Similarly, if a process is known to 
be Brownian motion with drift, a compound Poisson surplus process can be 
used as an approximation. 


8.7 BROWNIAN. MOTION AND THE PROBABILITY OF RUIN 


Let {W; t > 0} denote the Brownian motion with drift process with mean 
function pt and variance function o*t. Let U; = u + W; denote the Brownian 
motion with drift process with initial surplus Up = u. 

We consider the probability of ruin in a finite time interval (0,7) as well as 
the distribution of time until ruin if ruin occurs. Let T = minyyo{t : Ui < 0} 
be the time at which ruin occurs (with T = oo if ruin does not occur). Letting 
T — co will give ultimate ruin probabilities. 

The probability of ruin before time 7 can be expressed as 


wu, T) = 1- o(u, T) 
= Pr{T <7} 


= Pr {mia U: < o} 


<t<7 


= Prf min W; < -u } 
O<t<T 


O<t<T 


= Pr min m <-u}. 


Theorem 8.24 For the process U; described above, the ruin probability is 
given by 


blur) = o(-2 =) + exp (~24u) o(-= E) , (8.29) 


OTT OT 


where ©(-) is the cdf of the standard normal distribution. 


BROWNIAN MOTION AND THE PROBABILITY OF RUIN 257 


Original 
Reflection 


Surplus 
oO 


Fig. 8.4 Type B sample path and its reflection. 


Proof: Any sample path of U; with a final level U- < 0 must have first 
crossed the barrier U; = 0 at some time T < 7. For any such path U;, we 
define a new path Už which is the same as the original sample path for all 
t < T but is the reflection about the barrier U; = 0 of the original sample 
path for alt > T. Then f 


Üi 665 
Us = ' (8.30) 
—U;, t> T: 


The reflected path Už has final value Už = —U;. These are illustrated in 
Figure 8.4, which is based on the sample path in Figure 8.3. 

Now consider any path that crosses the barrier U; = 0 in the time interval 
(0,7). Any such path is one of two possible types: 

Type A: One that has a final value of U; < 0. 

Type B: One that has a final value of U; > 0. 
Any path of Type B is a reflection of some other path of Type A. Hence, 
sample paths can be considered in reflecting pairs. The probability of ruin at 
some time in (0,7) is the total probability of all such pairs: 


p(u, T) = Pr{T < T} =Pr (a, U, < o} ; 
where it is understood that all probabilities are conditional on Up = u. This 
probability is obtained by considering all original paths of Type A with final 
values U, = x < 0. By adding all the corresponding reflecting paths U;' as 
well, all possible paths that cross the ruin barrier are considered. Note that 
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the case U, = 0 has been left out. The probability of this happening is zero 
and so this event can be ignored. In Figure 8.4 the original sample path ended 
at a positive surplus value and so is of Type B. The reflection is of Type A. 

Let A, and B, denote the sets of all possible paths of Types A and B 
respectively, which end at U, = x for Type A and U, = —2 for Type B. 
Further let Pr{A,} and Pr{B,} denote the total probability associated with 
the paths in the sets. Hence the probability of ruin is 


0 
Pr{T <7}= I. Pr{U,; = eee dz. - (8.31) 


Because A, is the set of all possible paths ending at z, 
Pr{Az} = Pr{U,; = T}, 


where the right-hand side is the pdf of U;. Then 


pirea a : ee h + aid it (8.32) 


Because U; — u is a Brownian motion with drift process, U+ is normally 
distributed with mean u + uT and variance a?r, and so 


a r-u- pur)? 
Pr{U, = £} = (2r0°r) "? exp e] l 


To obtain Pr{B,}/Pr{Az}, we condition on all possible ruin times T. Then 


Pr{Bz} _ Jo Pr{BalT = t} Pr{T =t} dt 

Pr{Az} fy Pr{AzIT =t} Pr{T =t} dt 

Jo Pr{U; = —a|T = t} Pr{T =t} dt 
fg Pr{U, = |T =t}Pr{T = t} dt © 


The conditional pdf of U;|T = t is the same as the pdf of U, — U; because 
T = t implies U; = 0. The process U; has independent increments and so 


We are abusing probability notation here. The actual probability of these events is zero. 
What is being called a probability is really a probability density which can be integrated 
to produce probabilities for sets which have positive probability. 


BROWNIAN MOTION AND THE PROBABILITY OF RUIN 259 


U- — U; has a normal distribution. Then, 


Pr{U, =a2|T=t} = Pr{U, — U: = 2} 
[z — u(r = +)? 
a apf- 20%(r — 2) 


J 2n07(7 — t) 
x? — Qrp(r — t) + p(T - ty? 
ap- 202(r —t) \ 
J 2107(7 — t) 
x? + p?(7r —t)? 
og 


/ 2n07(T — t) 


Similarly, by replacing z with —z, 


T r? + ulr — t)? 
exp(-Fz) apf- a} 


Pr{U, = —2|T = t} = 


We then have 


pete Sse 
PB} _ Jo (anor 1) 


Rad ooo SEE) 


i /2n0?(r =t) 


Pr{T = t} dt 


Pr{T = t} dt 


Il 

[4>] 

t 
[D 
q |X 
og 
Nee” 


om 
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Then from (8.31) 
plu, Tt) = Pr{T <7} 


= x Pr{U; = s} h + Plr dex 


ens 
oT 
0 2 
2 \-1/2 (c-u-—pr) per 
+ [ erotey exp SSD a 
A (5) 
OnT 


0 2 
+ I (27077)~1/? exp |- et ey = sr] dx 


u+ HT 24 u — uT 
= O/[— 24A = 
( = ) +ex( “| ® ( ae g 


Corollary 8.25 The probability of ultimate ruin is given by 


w(u) = 1 — ġ(u) = Pr{T < œ} = exp (-2u) : (8.33) 


By letting T — co, this result follows from Theorem 8.24. It should be 
noted that the distribution (8.29) is a defective distribution because the 
cdf does not approach 1 as 7 — oo. The corresponding proper distribution is 
obtained by conditioning on ultimate ruin. 


Corollary 8.26 The distribution of the time until ruin given that it occurs 
is given by 
plu, T) 
y(u) 


PT < T|T < co} 


2 + — ur 
exp (žu) ® (-* =r) +8 (=£) ,T > 0.(8.34) 
oT oT 


Corollary 8.27 The probability density function of the time until ruin is 
given by 


Il 


u u— ur)? 
fr(r) = oma 3/2 exp je =H) | , T>O. (8.35) 
7E a A 


This is obtained by differentiation of (8.34) with respect to 7. It is not hard 
to see, using Appendix A, that if u > 0 then (8.35) is the pdf of an inverse 
Gaussian distribution with mean u/p and variance uo?/p3. With u = 0, ruin 
is certain and the time until ruin (8.35) has pdf 
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and cdf [from (8.34)| 


Fr(r) = 28 (-—z) , T>0. 


This distribution is the one-sided stable law with index 1/2. 

These results can be used as approximations for the original surplus process 
(8.26) based on the compound Poisson model. In this situation c = (i+ 
9)\E(X) where @ is the relative premium loading. We do this by substituting 
back using (8.27) and (8.28). 

Then, for the process (8.26) from (8.29), (8.33), and (8.35), we have 


hat _ ut O0aTE(X) 
peek e| ae 
2E(X) u — O\TE(X) 
TR E | - ATE(X?) | pie wee 
w(u) = exp |- ae Pu f u > 0, 
and 
pe u ssia _ [u OATE(X)/? 
POS EA ý | INT E(X?) \, fe 


Then, for any compound Poisson-based process, it is easy to get simple nu- 
merical approximations. For example, the expected time until ruin, given that 
it occurs, is a i 

E(T) = z 7 DE (8.36) 
Naturally, the accuracy of this approximation depends on the relative sizes of 
the quantities involved. 

Tt should be noted from (8.36) that the expected time of ruin, given that 
it occurs, depends (as we might expect) on the four key quantities which 
describe the surplus process. A higher initial surplus (u) increases the time 
to ruin, while increasing any of the other components decreases the expected 
time. This may appear surprising at first, but, for example, increasing the 
loading increases the rate of expected growth in surplus, making ruin difficult. 
Therefore, if ruin should occur, it will have to happen soon, before the high 
loading leads to large gains. If À is large, the company is essentially much 
larger and events happen more quickly. Therefore, ruin, if it happens, will 
occur sooner. Finally, a large value of E(X) makes it easier for an early claim 
to wipe out the initial surplus. 

All these are completely intuitive. However, formula (8.36) shows how each 
factor can have an influence on the expected ruin time. 

On a final note, it is also possible to use the compound Poisson-based risk 
process Z; as an approximation for a Brownian motion process. For known 


262 CONTINUOUS-TIME RUIN MODELS 


drift and variance parameters u and o”, one can use (8.27) and (8.28) to 
obtain 


= OXE(X), (8.37) 


to 


o = E(X?). (8.38) 


It is convenient to fix the jump sizes so that E(X) = k, say, and E(X?) = k?. 
Then we have 


2 
= T (8.39) 
@ = E = 5h. (8.40) 


When p and g? are fixed, choosing a value of k fixes à and 6. Hence, the 
Poisson-based process can be used to approximate the Brownian motion with 
accuracy determined only by the parameter k. The smaller the value of k, the 
smaller the jump sizes and the larger the number of jumps per unit time. 

Hence, simulation of the Brownian motion process can be done using the 
Poisson-based process. To simulate the Poisson-based process, the waiting 
times between successive events are generated first because these are expo- 
nentially distributed with mean 1/A. As k becomes small, \ becomes large 
and the mean waiting time becomes small. 


Part Ill 


Construction of 
empirical models 


Review of 
mathematical statistics 


9.1 INTRODUCTION 


Before studying empirical models and then parametric models, we review some 
concepts from mathematical statistics. Mathematical statistics is a broad sub- 
ject that includes many topics not covered in this chapter. For those that are 
covered, it is assumed that the reader has had some prior exposure. The 
topics of greatest importance for constructing actuarial models are estimation 
and hypothesis testing. Because the Bayesian approach to statistical infer- 
ence is often either ignored or treated lightly in introductory mathematical 
statistics texts and courses, it receives more in-depth coverage in this text in 
Section 12.4. Bayesian methodology also provides the basis for the credibility 
methods covered in Chapter 16. 

To see the need for methods of statistical inference, consider the case where 
your boss needs a model for basic dental payments. One option is to simply 
announce the model. You announce that it is the lognormal distribution with 
u = 5.1239 and o = 1.0345 (the many decimal places are designed to give 
your announcement an aura of precision). When your boss, or a regulator, or 
an attorney who has put you on the witness stand asks you how you know 
that to be so, it will likely not be sufficient to answer that “I just know these 
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things.” It may not even be sufficient to announce that your friend at Gamma 
Dental uses that model. 

An alternative is to collect some data and use it to formulate a model. 
Most distributional models have two components. The first is a name, such 
as “Pareto.” The second is the set of values of parameters that complete the 
specification. Matters would be simpler if modeling could be done in that 
order. Most of the time we need to fix the parameters that go with a named 
model before we can decide if we want to use that type of model. 

Because the parameter estimates are based on a sample from the population 
and not the entire population, the results will not be the true values. It is 
important to have an idea of the potential error. One way to express this is 
with an interval estimate. That is, rather than announcing a particular value, 
a range of plausible values is presented. 

When named parametric distributions are used, the parameterizations used 
are those from Appendices A and B. 


9.2 POINT ESTIMATION 


9.2.1 Introduction 


Regardless of how a model is estimated, it is extremely unlikely that the 
estimated model will exactly match the true distribution. Ideally, we would 
like to be able to measure the error we will be making when using the estimated 
model. But this is clearly impossible! If we knew the amount of error we 
had made, we could adjust our estimate by that amount and then have no 
error at all. The best we can do is discover how much error is inherent in 
repeated use of the procedure, as opposed to how much error we made with our 
current estimate. Therefore, this section is about the quality of the ensemble 
of answers produced from the procedure, not about the quality of a particular 
answer. 

This is an important point with regard to actuarial practice. What is im- 
portant is that an appropriate procedure be used, with everyone understand- 
ing that even the best procedure can lead to a poor result once the random 
future outcome has been revealed. This point is stated nicely in a Society of 
Actuaries principles draft ((124], pp. 779-780) regarding the level of adequacy 
of a provision for a block of life risk obligations (that is, the probability that 
the company will have enough money to meet its contractual obligations): 


The indicated level of adequacy is prospective, but the actuarial model 
is generally validated against past experience. It is incorrect to con- 
clude on the basis of subsequent experience that the actuarial assump- 
tions were inappropriate or that the indicated level of adequacy was 
overstated or understated. 


When constructing models, there are a number of types of error. Several 
will not be covered here. Among them are model error (choosing the wrong 
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model) and sampling frame error (trying to draw inferences about a popula- 
tion that differs from the one sampled). An example of model error is selecting 
a Pareto distribution when the true distribution is Weibull. An example of 
sampling frame error is sampling claims from policies sold by independent 
agents in order to price policies sold over the Internet. 

The type of error we can measure is that due to using a sample from 
the population to make inferences about the entire population. Errors occur 
when the items sampled do not represent the population. As noted earlier, 
we cannot know if the particular items sampled today do or do not represent 
the population. We can, however, estimate the extent to which estimators are 
affected by the possibility of a nonrepresentative sample. 

The approach taken in this section is to consider all the samples that might 
be taken from the population. Each such sample leads to an estimated quan- 
tity (for example, a probability, a parameter value, or a moment). We do not 
expect the estimated quantities to always match the true value. For a sensible 
estimation procedure we do expect that for some samples the quantity will 
match the true value, for many it will be close, and for only a few it will be 
quite different. If we can construct a measure of how well the set of potential 
estimates matches the true value, we have a handle on the quality of our es- 
timation procedure. The approach outlined here is often called the classical 
or frequentist approach to estimation. 

Finally, we need a word about the difference between estimate and esti- 
mator. The former refers to the specific value obtained when applying an 
estimation procedure to a set of numbers. The latter refers to a rule or for- 
mula that produces the estimate. An estimate is a number or function while 
an estimator is a random variable or a random function. Usually, both the 
words and the context will make clear which is being referred to. 


9.2.2 Measures of quality 


9.2.2.1 Introduction There are a number of ways to measure the quality of 
an estimator. Three of them are discussed here. Two examples will be used 
throughout to illustrate them. 


Example 9.1 A population contains the values 1, 3, 5, and 9. We want to 
estimate the population mean by taking a sample of size 2 with replacement. 


Example 9.2 A population has the exponential distribution with a mean of 
0. We want to estimate the population mean by taking a sample of size 3 with 
replacement. 


Both examples are clearly artificial in that we know the answers prior to 
sampling (4.5 and 0). However, that knowledge will make apparent the error 
in the procedure we select. For practical applications, we will need to be able 
to estimate the error when we do not know the true value of the quantity 
being estimated. 
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9.2.2.2 Unbiasedness When constructing an estimator, it would be good if, 
on average, the errors we make cancel each other out. More formally, let 
6 be the quantity we want to estimate. Let @ be the random variable that 
represents the estimator and let E(6|@) be the expected value of the estimator 
6 when @ is the true parameter value. 


Definition 9.3 An estimator, 0, is unbiased if E(6|0) = 8 for all 0. The 
bias is biasg(#) = E(0|0) — 8. 


The bias depends on the estimator being used and may also depend on the 
particular value of 8. 


Example 9.4 For Example 9.1 determine the bias of the sample mean as an 
estimator of the population mean. 


The population mean is 0 = 4.5. The sample mean is the average of the two 
observations. It is also the estimator we would use employing the empirical 
approach. In all cases, we assume that sampling is random. In other words, 
every sample of size n has the same chance of being drawn. Such sampling 
also implies that any member of the population has the same chance of being 
observed as any other member. For this example, there are 16 equally likely 
ways the sample could have turned out. They are listed below. 


1,1 1,3 1,5 1,9 3,1 3,3 3,5 3,9 
5,1 53 5,5 5,9 9,1 9,3 9,5 9,9 


This leads to the following 16 equally likely values for the sample mean: 


Combining the common values, the sample mean, usually denoted X, has 
the following probability distribution: 


x 1 2 3 4 5 6 7 9 
pe(z) 1/16 2/16 3/16 2/16 3/16 2/16 2/16 1/16 


The expected value of the estimator is 
B(X) = (1(1) + 2(2) + 3(3) + 4(2) + 5(3) + 6(2) + 7(2) + 9(1)]/16 = 4.5 


and so the sample mean is an unbiased estimator of the population mean for 
this example. Oo 
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Example 9.5 For Example 9.2 determine the bias of the sample mean and 
the sample median as estimators of the population mean. 


The sample mean is X = (X1 + X2 + X3)/3, where each X; represents one 
of the observations from the exponential population. Its expected value is 


E(X) = E (aes 72 + Xa 


= 3(0+0+0)=0 


) = 4 [E(X1) + E(X2) + E(X3)] 


and therefore the sample mean is an unbiased estimator of the population 
mean. 

Investigating the sample median is a bit more difficult. The distribution 
function of the middle of three observations can be found as follows, using 
Y as the random variable of interest and X as the random variable for an 
observation from the population: 


Fy(y) = Pr(¥ <y) = Pr(X1, X2, X3 < y) + Pr(X1, X2 < y, X3 > y) 
+Pr(Xi, X3 < Y, X2 > y) + Pr(X2, X3 < Y, XY > y) 
= Fx(y)? +3Fx(y)?[l — Fx(y)] 
= [l-e +3[1-— e790 Pe., 


The density function is 


fry) = FY) = 5 (eee e-8u/ ae 


The expected value of this estimator is 


a vs (ew B e7200) di 
0 


50 
. 6- 
This estimator is clearly biased,! with biasy (6) = 50/6 — 0 = —0/6. On 
average, this estimator underestimates the true value. It is also easy to see that 


the sample median can be turned into an unbiased estimator by multiplying 
it by 1.2. 


E(Y |0) 


For Example 9.2 we have two estimators (the sample mean and 1.2 times 
the sample median) that are both unbiased. We will need additional criteria 
to decide which one we prefer. 


lThe sample median is not likely to be a good estimator of the population mean. This 
example studies it for comparison purposes. Because the population median is @in2, the 
sample median is biased for the population median. 
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Some estimators exhibit a small amount of bias, which vanishes as the 
sample size goes to infinity. 


Definition 9.6 Let 6, be an estimator of 0 based on a sample size ofn. The 
estimator is asymptotically unbiased if 


lim E(6,,|0) = 0 
for all @. 


Example 9.7 Suppose a random variable has the uniform distribution on the 
interval (0,0). Consider the estimator În = max(Xj,...,Xn). Show that this 
estimator is asymptotically unbiased. 


Let Yp be the maximum from a sample of size n. Then 


Fy (y) = Pr(Yn <y) =Pr(%1 < Y,- -Xn Sy) 
[Fx (y)|" 
= (y/@)” 


ni ni 


fya) = Za OY <8. 


The expected value is 


8 8 
= ngon 2 n ntlo” = no 
em = f ny" "dy = yT” aT 


As n — 00, the limit is 0, making this estimator asymptotically unbiased. 0 
9.2.2.3 Consistency A second desirable property of an estimator is that it 
works well for extremely large samples. Slightly more formally, as the sample 


size goes to infinity, the probability that the estimator is in error by more 
than a small amount goes to zero. A formal definition follows. 


Definition 9.8 An estimator is consistent (often called, in this contezt, 
weakly consistent) if, for all 6 > 0 and any 0, 


lim Pr(|4n — 8| > 6) =0. 


A sufficient (although not necessary) condition for weak consistency is that 
the estimator be asymptotically unbiased and Var(@,) — 0. 


Example 9.9 Prove that, if the variance of a random variable is finite, the 
sample mean is a consistent estimator of the population mean. 
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From Exercise 9.2, the sample mean is unbiased. In addition, 
Var(X) = Var ts 5 X; 
; = 22 j 


1 n 
= =) Var(X) 
j=1 


3 Var(X) ai 
n 
The second step follows from assuming that the observations are indepen- 
dent. m 


Example 9.10 Show that the mazimum observation from a uniform distrib- 
ution on the interval (0,0) is a consistent estimator of 8. 


From Example 9.7, the maximum is asymptotically unbiased. The second 
moment is 


A 8 8 no 
y2 = nigon ag n+29-n| 
E(Y,) | nyo dy = zya eo) 
and then 
n8? no \? no? * 
Var(Y,) = £ 
alin) = 78 (5) Gam * g 


9.2.2.4 Mean-squared error While consistency is nice, most estimators have 
this property. What would be truly impressive is an estimator that is not only 
correct on average but comes very close most of the time and, in particular, 
comes closer than rival estimators. One measure for a finite sample is moti- 
vated by the definition of consistency. The quality of an estimator could be 
measured by the probability that it gets within 6 of the true value—that is, by 
measuring Pr(|9,, —6| < 6). But the choice of 6 is arbitrary and we prefer mea- 
sures that cannot be altered to suit the investigator’s whim. Then we might 
consider E(|ĝ„ — 0|), the average absolute error. But we know that working 
with absolute values often presents unpleasant mathematical challenges, and 
so the following has become widely accepted as a measure of accuracy. 


Definition 9.11 The mean-squared error (MSE) of an estimator is 
MSE, (0) = E{(6 — 0)°10]. 


Note that the MSE is a function of the true value of the parameter. An 
estimator may perform extremely well for some values of the parameter but 
poorly for others. 
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Example 9.12 Consider the estimator 6 = 5 of an unknown parameter 8. 
The MSE is (5 — 0)?, which is very small when 0 is near 5 but becomes poor 
for other values. Of course this estimate is both biased and inconsistent unless 
0 is exactly equal to 5. 


A result that follows directly from the various definitions is 
MSE, (6) = E{[8 — E(6|) + E(6|9) — 01°10} = Var(8|@) + [biasg(0)]?. (9.1) 


If we restrict attention to only unbiased estimators, the best such could be 
defined as follows. 


Definition 9.13 An estimator, 8, is called a uniformly minimum vari- 
ance unbiased estimator (UMVUE) if it is unbiased and for any true 
value of @ there is no other unbiased estimator that has a smaller variance. 


Because we are looking only at unbiased estimators, it would have been 
equally effective to make the definition in terms of MSE. We could also gen- 
eralize the definition by looking for estimators that are uniformly best with 
regard to MSE, but the previous example indicates why that is not feasible. 
There are a few theorems that can assist with the determination of UMVUEs. 
However, such estimators are difficult to determine. On the other hand, MSE 
is still a useful criterion for comparing two alternative estimators. 


Example 9.14 For Example 9.2 compare the MSEs of the sample mean and 
1.2 times the sample median. 


The sample mean has variance 


Var(X) _ 6 
5 3. 
When multiplied by 1.2, the sample median has second moment 


peer) 
1a | y= 729/8 — e7399 ) dy 
Ph (eae et) 
6 | 2 -0 —2y/0 4 —3y/8 
1$ |y (Fe tze 
6? _» 8? 
—2y | ae He — — —3y/8 
u (Fe 9° ) 


6 —2y/8 0° —3y/@ Ñ 
+2 (Fe + 97° 5 


E[(1.2Y)°] 


ll 


8.64 (20° 26° _ 380° 
8 27 25 
for a variance of 
380° 2 1307 _ 6 
6 37 
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The sample mean has the smaller MSE regardless of the true value of 0. 
Therefore, for this problem, it is a superior estimator of 0. o 


Example 9.15 For the uniform distribution on the interval (0,0) compare 
the MSE of the estimators 2X and [(n+1)/n] max(Xı,..., Xn). Also evaluate 
the MSE of max(X1,...,Xn)- 


The first two estimators are unbiased, so it is sufficient to compare their 
variances. For twice the sample mean, 


ei =O 4 e 
Var(2X) = z Væ(x) = = an 


For the adjusted maximum, the second moment is 


E (Hn) _ (n+1) no? _ (n+1)?6? 
n "“) | @ n42 (n+2)n 


for a variance of 5 
(n+12P p # 
(n+2)n n(n +2) 
Except for the case n = 1 (and then the two estimators are identical), the one 


based on the maximum has the smaller MSE. The third estimator is biased. 
For it, the MSE is 


no +( no 9 a 267 
(n+2)(n+1)? n+1 ~ (n+1)(n+2) 


which is also larger than that for the adjusted maximum. o 


9.2.3 Exercises 


9.1 For Example 9.1, show that the mean of three observations drawn without 
replacement is an unbiased estimator of the population mean while the median 
of three observations drawn without replacement is a biased estimator of the 
population mean. 


9.2 Prove that for random samples the sample mean is always an unbiased 
estimator of the population mean. 


9.3 Let X have the uniform distribution over the range (9 — 2,0 + 2). That 
is, fx(z) = 0.25, 0—2 < xz < 0 +2. Show that the median from a sample of 
size 3 is an unbiased estimator of 0. 


9.4 Explain why the sample mean may not be a consistent estimator of the 
population mean for a Pareto distribution. 


_ 
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9.5 For the sample of size 3 in Exercise 9.3, compare the MSE of the sample 
mean and median as estimates of @. 


9.6 (*) You are given two independent estimators of an unknown quantity 0. 
For estimator A, E(@ A) = 1,000 and Var(0 A) = 160,000, while for estimator 
B, B(@g) = 1,200 and Var(z) = 40,000. Estimator C is a weighted average, 
dc = wba + (1 -— w)Îs. Determine the value of w that minimizes Var(8c). 


9.7 (*) A population of losses has the Pareto distribution (see Appendix 
A) with 0 = 6,000 and a unknown. Simulation of the results from maximum 
likelihood estimation based on samples of size 10 has indicated that E(@) = 2.2 
and MSE(&) = 1. Determine Var(â) if it is known that a = 2. 


9.8 (*) Two instruments are available for measuring a particular nonzero 
distance. The random variable X represents a measurement with the first 
instrument, and the random variable Y with the second instrument. Assume 
X and Y are independent with E(X) = 0.8m, E(Y) = m, Var(X) = m?, and 
Var(Y) = 1.5m?, where m is the true distance. Consider estimators of m that 
are of the form Z = aX + 8Y. Determine the values of a and £ that make Z 
a UMVUE within the class of estimators of this form. 


9.9 A population contains six members, with values 1, 1, 2, 3, 5, and 10. 
A random sample of size 3 is drawn without replacement. In each case the 
objective is to estimate the population mean. Note: A spreadsheet with an 
optimization routine may be the best way to solve this problem. 


(a) Determine the bias, variance, and MSE of the sample mean. 
(b) Determine the bias, variance, and MSE of the sample median. 


(c) Determine the bias, variance, and MSE of the sample midrange 
(the average of the largest and smallest observations). 
(d) Consider an arbitrary estimator of the form aX 1) + bX 2) + cX (3); 
where X(1) S Xa) S X gy, are the sample order statistics. 
i. Determine a restriction on the values of a, b, and c that will 
assure that the estimator is unbiased. 
ii. Determine the values of a, b, and c that will produce the un- 
biased estimator with the smallest variance. 
iii. Determine the values of a, b, and c that will produce the (pos- 
sibly biased) estimator with the smallest MSE. 


9.10 (*) Two different estimators, 6, and 5, are being considered. To test 
their performance, 75 trials have been simulated, each with the true value set 
at 0 = 2. The following totals were obtained: 


75 7 75 n2 75 R 75 22 
> bay = 165, $ ĝi = 375, Şo ĝaj = 147, $ ĝa; = 312, 
j=1 j=1 j=l 


j=l 


INTERVAL ESTIMATION 275 


where ĝi; is the estimate based on the jth simulation using estimator 6;. 
Estimate the MSE for each estimator and determine the relative efficiency 
(the ratio of the MSEs). 


9.3 INTERVAL ESTIMATION 


All of the estimators discussed to this point have been point estimators. 
That is, the estimation process produces a single value that represents our 
best attempt to determine the value of the unknown population quantity. 
While that value may be a good one, we do not expect it to exactly match 
the true value. A more useful statement is often provided by an interval 
estimator. Instead of a single value, the result of the estimation process is 
a range of possible numbers, any of which is likely to be the true value. A 
specific type of interval estimator is the confidence interval. 


Definition 9.16 A 100(1 — a)% confidence interval for a parameter 6 is 
a pair of values L and U computed from a random sample such that Pr(L < 
0 <U)2>1-—a for all 0. 


Note that this definition does not uniquely specify the interval. Because the 
definition is a probability statement and must hold for all @, it says nothing 
about whether or not a particular interval encloses the true value of 0 from a 
particular population. Instead, the level of confidence, 1—a, is a property of 
the method used to obtain L and U and not of the particular values obtained. 
The proper interpretation is that, if we use a particular interval estimator 
over and over on a variety of samples, at least 100(1 — a)% of the time our 
interval will enclose the true value. 

Constructing confidence intervals is usually very difficult. For example, we 
know that, if a population has a normal distribution with unknown mean and 
variance, a 100(1 — a)% confidence interval for the mean uses 


L= X = te/2,n—18/V0, U = X + taj2,n-18/ VM, (9.2) 


where s = 4/9 ;j=1 (X — Ž)2/(n— 1) and ta/2, is the 100(1 — œ/2)th per- 
centile of the ¢ distribution with b degrees of freedom. But it takes a great 
deal of effort to verify that this is correct (see, for example, [58], p. 214). 

However, there is a method for constructing approximate confidence inter- 
vals that is often accessible. Suppose we have a point estimator 6 of parameter 
8 such that E(6) = 6, Var(@) = v(@), and @ has approximately a normal dis- 
tribution. Theorem 12.13 shows that this is often the case. With all these 
approximations, we have that approximately 


6-6 
l—a=Pr | —za/2 < < Za/2 |; 9.3 
(son AO an) (9.3) 
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where Zq/2 is the 100(1—a/2)th percentile of the standard normal distribution. 
Solving for 0 produces the desired interval. Sometimes this is difficult to do 


(due to the appearance of @ in the denominator) and so, if necessary, replace’ 


v() in (9.3) with v() to obtain a further approximation, 


1-a=Pr (0- zayo) <0 <0+ za). (9.4) 


Example 9.17 Use (9.4) to construct an approximate 95% confidence inter- 
val for the mean of a normal population with unknown variance. 


Use 6 = X and then note that E(Î) = 0, Var(0) = a? /n, and @ does have a 
normal distribution. The confidence interval is then X + 1.96s/,/n. Because 
t.025,n-1 > 1.96, this approximate interval must be narrower than the exact 
interval given by (9.2). That means that our level of confidence is something 
less than 95%. o 


Example 9.18 Use (9.3) and (9.4) to construct approrimate 95% confidence 
intervals for the mean of a Poisson distribution. Obtain intervals for the 


particular case where n = 25 and Ẹ = 0.12. 


Let 6 = X, the sample mean. For the Poisson distribution, E(@) = E(X) = 
0 and v(0) = Var(X) = Var(X)/n = 8/n. For the first interval 


0.95 = Pr (100 < eek < 198) 


Jo/n 


|X -0| < Las, 
nı 


is true if and only if 


which is equivalent to 


or 


Solving the quadratic produces the interval 


z 2 l X 2 
pan 1.9208 Š 1 15.3664X + 3.84162 /n 
n 2 n 


and for this problem the interval is 0.197 + 0.156. 

For the second approximation the interval is X+£1.96,/X/n and for the ex- 
ample it is 0.12+0.136. This interval extends below zero (which is not possible 
for the true value of 0). This is because (9.4) is too crude an approximation 
in this case. 
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9.3.1 Exercise 


9.11 Let £1,...,£n be a random sample from a population with pdf f(x) = 
g-te~2/ x > 0. This exponential distribution has a mean of @ and a variance 
of 6”. Consider the sample mean, X, as an estimator of @. It turns out that 
X/@ has a gamma distribution with a =n and @ = 1/n, where in the second 
expression the “8” on the left is the parameter of the gamma distribution. For 
a sample of size 50 and a sample mean of 275, develop 95% confidence intervals 
by each of the following methods. In each case, if the formula requires the 
true value of 0, substitute the estimated value. 


(a) Use the gamma distribution to determine an exact interval. 


(b) Use a normal approximation, estimating the variance prior to solv- 
ing the inequalities. 


c) Usea normal a proximation, estimating 6 after solvin the inequal- 
P 8 
ities. 


9.4 TESTS OF HYPOTHESES 


Hypothesis testing is covered in detail in most mathematical statistics texts. 
This review will be fairly straightforward and will not address philosophical 
issues or consider alternative approaches. A hypothesis test begins with two 
hypotheses, one called the null and one called the altérnative. The traditional 
notation is Ho for the null hypothesis and Hı for the alternative hypothesis. 
The two hypotheses are not treated symmetrically. Reversing them may alter 
the results. To illustrate this process, a simple example will be used. 


Example 9.19 Your company has been basing its premiums on an assump- 
tion that the average claim is 1,200. You want to raise the premium and a 
regulator has insisted that you provide evidence that the average now exceeds 
1,200. To provide such evidence, the following numbers have been obtained. 
What are the hypotheses for this problem? 


nner 
27 82 115 126 155 161 243 294 340 384 
457 680 855 877 974 1,193 1,840 1,884 2,558 15,743 
nano 


Let u be the population mean. One hypothesis (the one you claim is true) is 
that u > 1,200. Because hypothesis tests must present an either/or situation, 
the other hypothesis must be u < 1,200. The only remaining task is to decide 
which of them is the null hypothesis. Whenever the universe of continuous 
possibilities is divided in two there is likely to be a boundary that needs to 
be assigned to one hypothesis or the other. The hypothesis that includes 


a 


r 


ee _—_——— 
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the boundary must be the null hypothesis. Therefore, the problem can be 
succinctly stated as: 


Ho : p<1,200 


H, : p> 1,200. Oo 


The decision is made by calculating a quantity called a test statistic. It 
is a function of the observations and is treated as a random variable. That is, 
in designing the test procedure we are concerned with the samples that might 
have been obtained and not with the particular sample that was obtained. 
The test specification is completed by constructing a rejection region. It 
is a subset of the possible values of the test statistic. If the value of the test 
statistic for the observed sample is in the rejection region, the null hypothesis 
is rejected and the alternative hypothesis is announced as the result that is 
supported by the data. Otherwise, the null hypothesis is not rejected (more 
on this later). The boundaries of the rejection region (other than plus or 
minus infinity) are called the critical values. 


Example 9.20 (Example 9.19 continued) Complete the test using the test 
statistic and rejection region that is promoted in most statistics books. Assume 
that the population. has a normal distribution with standard deviation 3,435. 


The traditional test statistic for this problem is 


= — 1,200 
z = 2 909 
3,435/ V20 


and the null hypothesis is rejected if z > 1.645. Because 0.292 is less than 
1.645, the null hypothesis is not rejected. The data do not support the asser- 
tion that the average claim exceeds 1,200. Q 


The test in the previous example was constructed to meet certain objec- 
tives. The first objective is to control what is called the Type I error. It is the 
error made when the test rejects the null hypothesis in a situation where it 
happens to be true. In the example, the null hypothesis can be true in more 
than one way. This leads to the most common measure of the propensity of 
a test to make a Type I error. 


Definition 9.21 The significance level of a hypothesis test is the probabil- 
ity of making a Type I error given that the null hypothesis is true. If it can be 
true in more than one way, the level of significance is the maximum of such 
probabilities. The significance level is usually denoted by the letter a. 


This is a conservative definition in that it looks at the worst case. It is 
typically a case that is on the boundary between the two hypotheses. 
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Example 9.22 Determine the level of significance for the test in Example 
9.20. 


Begin by computing the probability of making a Type I error when the null 
hypothesis is true with u = 1,200. Then, 


Pr(Z > 1.645|u = 1,200) = 0.05. 


That is because the assumptions imply that Z has a standard normal distri- 
bution. 


Now suppose p has a value that is below 1,200. Then 


X — 1,200 
r| ————= > 1.645 
Gene ) 
X — p+ u — 1,200 
= Pr | = > 1.645 
( 3,435/V 20 ) 
X -p u — 1,200 
= Pr | ———— > 1.645 - ———= } . 
(<i Lm) 


Because pz is known to be less than 1,200, the right-hand side is always greater 
than 1.645. The left-hand side has a standard normal distribution and there- 
fore the probability is less than 0.05. Therefore the significance level is 0.05.0 


The significance level is usually set in advance and is often between 1 
and 10%. The second objective is to keep the Type II error (not rejecting 
the null hypothesis when the alternative is true) probability small. Generally, 
attempts to reduce the probability of one type of error increase the probability 
of the other. The best we can do once the significance level has been set is 
to make the Type II error as small as possible, though there is no assurance 
that the probability will be a small number. The best test is one that meets 
the following requirement. 


Definition 9.23 A hypothesis test is uniformly most powerful if no other 
test exists that has the same or lower significance level and for a particular 
value within the alternative hypothesis has a smaller probability of making a 
Type II error. 


Example 9.24 (Example 9.22 continued) Determine the probability of mak- 
ing a Type II error when the alternative hypothesis is true with u = 2,000. 


(m < LOH = 2,000 
3,435//20 ~~ er 


= Pr(X — 1,200 < 1,263.51| = 2,000) 
= Pr(X < 2,463.51| = 2,000) 
A (= —2,000 _ 2,463.51 — 2,000 


a“ < = 0.6035 } = 0.7269. 
3,435//20 3,435/ V20 on 


u 


| 
i 
l 
1 
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For this value of u, the test is not very powerful, having over a 70% chance of 


making a Type II error. Nevertheless (though this is not easy to prove), the 
0O A 


test used is the most powerful test for this problem. 


Because the Type II error probability can be high, it is customary to not 
make a strong statement when the null hypothesis is not rejected. Rather 
than say we choose or accept the null hypothesis, we say that we fail to reject 
it. That is, there was not enough evidence in the sample to make a strong 
argument in favor of the alternative hypothesis, so we take no stand at all. 

A common criticism of this approach to hypothesis testing is that the choice 
of the significance level is arbitrary. In fact, by changing the significance level, 
any result can be obtained. 


Example 9.25 (Example 9.24 continued) Complete the test using a signifi- 
cance level of a = 0.45. Then determine the range of significance levels for 
which the null hypothesis is rejected and for which it is not rejected. 


Because Pr(Z > 0.1257) = 0.45, the null hypothesis is rejected when 


X — 1,200 
3,435 /+/20 


In this example, the test statistic is 0.292, which is in the rejection region, 
and thus the null hypothesis is rejected. Of course, few people would place 
confidence in the results of a test that was designed to make errors 45% of 
the time. Because Pr(Z > 0.292) = 0.3851, the null hypothesis is rejected for 
those who select a significance level that is greater than 38.51% and is not 
rejected by those who use a significance level that is less than 38.51%. Oo 


> 0.1257. 


Few people are willing to make errors 38.51% of the time. Announcing this 
figure is more persuasive than the earlier conclusion based on a 5% significance 
level. When a significance level is used, readers are left to wonder what the 
outcome would have been with other significance levels. The value of 38.51% 
is called a p-value. A working definition is: 


Definition 9.26 For a hypothesis test, the p-value is the probability that the 
test statistic takes on a value that is less in agreement with the null hypothesis 
than the value obtained from the sample. Tests conducted at a significance level 
that is greater than the p-value will lead to a rejection of the null hypothesis, 
while tests conducted at a significance level that is smaller than the p-value 
will lead to a failure to reject the null hypothesis. 


Also, because the p-value must be between 0 and 1, it is on a scale that 
carries some meaning. The closer to zero the value is, the more support the 
data give to the alternative hypothesis. Common practice is that values above 


TESTS OF HYPOTHESES 281 


10% indicate that the data provide no evidence in support of the alternative 
hypothesis, while values below 1% indicate strong support for the alternative 
hypothesis. Values in between indicate uncertainty as to the appropriate 
conclusion and may call for more data or a more careful look at the data or 
the experiment that produced it. 


9.4.1 Exercise 


9.12 (Exercise 9.11 continued) Test Ho : 0 > 325 vs Hı : 0 < 325 using 
a significance level of 5% and the sample mean as the test statistic. Also, 
compute the p-value. Do this using the exact distribution of the test statistic 
and a normal approximation. 


Estimation for 
complete data 


10.1 INTRODUCTION 


The material in this and the next chapter has been traditionally presented 
under the heading of “survival models” with the accompanying notion that 
the techniques are useful only when studying lifetime distributions. Standard 
texts on the subject such as Klein and Moeschberger [74] and Lawless [81] 
contain examples that are exclusively oriented in that direction.. However, 
as will be seen in these two chapters, the same problems that occur when 
‘modeling lifetime occur when modeling payment amount. The examples we 
present will be of both types. To emphasize that point, some of the starred 
exercises were taken from the former Society of Actuaries Course 160 exam, 
but the setting was changed to a payment environment. Only a handful 
of references are presented, most of the results being well developed in the 
survival models literature. Readers wanting more detail and proofs should 
consult a text dedicated to the subject, such as the ones mentioned above. 
In this chapter it is assumed that the type of model is known but not the 
full description of the model. In Chapter 4, models were divided into two 
types—data-dependent and parametric. The definitions are repeated below. 


Loss Models: From Data to Decisions, Second Edition. 
By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 
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Definition 10.1 A data-dependent distribution is at least as complex as 
the data or knowledge that produced it and the number of “parameters” in- 
creases as the number of data points or amount of knowledge increases. : 


Definition 10.2 A parametric distribution is a set of distribution func- 
tions, each member of which is determined by specifying one or more values 
called parameters. The number of parameters is fixed and finite. 


Here, only two data-dependent distributions will be considered. They de- 
pend on the data in similar ways. The simplest definitions for the two types 
considered appear below. 


Definition 10.3 The empirical distribution is obtained by assigning prob- 
ability 1/n to each data point. 


Definition 10.4 A kernel smoothed distribution is obtained by replacing 
each data point with a continuous random variable and then assigning proba- 
bility 1/n to each such random variable. The random variables used must be 
identical except for a location or scale change that is related to tts associated 
data point. 


- Note that the empirical distribution is a special type of kernel smoothed 
distribution in which the random variable assigns probability 1 to the data 
point. An alternative to the empirical distribution that is similar in spirit but 
produces different numbers will also be presented. In Chapter 11 it will be 
shown how the definition can be modified to account for data that have been 
altered through censoring and truncation. With regard to kernel smoothing, 
there are several distributions that could be used, a few of which are intro- 
duced in Section 11.3. 

Throughout this part, four examples will used repeatedly. Because they 
are simply data sets, they will be referred to as Data Sets A, B, C and D. 


Data Set A This data set is well-known in the casualty actuarial literature. 
Tt was first analyzed in the paper [30] by Dropkin in 1959. He collected data 
from 1956-1958 on the number of accidents by one driver in one year. The 
results for 94,935 drivers are in Table 10.1. 


Data Set B These numbers are artificial. They represent the amounts paid 
on workers compensation medical benefits but are not related to any particular 
policy or set of policyholders. These payments are the full amount of the loss. 
A random sample of 20 payments is given in Table 10.2. 


Data Set C These observations represent payments on 227 claims from a 
general liability insurance policy. The data are in Table 10.3. 


Data Set D These numbers are artificial. They represent the time at which a 
five-year term insurance policy terminates. For some policyholders, termina- 
tion is by death, for some it is by surrender (the cancellation of the insurance 
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Table 10.1 Data Set A 


Number of accidents Number of drivers 


0 81,714 
1 11,306 
2 1,618 
3 250 
4 40 
5 or more T 
Table 10.2 Data Set B 
27 82 115 126 155 161 243 294 340 384 


457 680 855 877 974 1,193 1,340 1,884 2,558 15,743 


Table 10.3 Data Set C 


Payment range Number of payments 


0-7,500 99 
7,500-17,500 42 
17,500-32,500 29 
32,500-67,500 ' 28 
67,500-125,000 17 
125,000-300,000 9 
Over 300,000 ' 3 


contract), and for the remainder it is expiration of the five-year period. Two 
separate versions are presented. For Data Set D1 (Table 10.4) there were 30 
policies observed from issue. For each, both the time of death and time of sur- 
render are presented, provided they were before the expiration of the five-year 
period. Of course, normally we do not know the time of death of policyholders 
who surrender and we do not know when policyholders who died would have 
surrendered had they not died. Note that the final 12 policyholders survived 
both death and surrender to the end of the five-year period. 

For Data Set D2 (Table 10.5), only the time of the first event is observed. 
In addition, there are 10 more policyholders who were first observed at some 
time after the policy was issued. The table presents the results for all 40 
policies. The column headed “First observed” gives the duration at which the 
policy was first observed; the column headed “Last observed” gives the duration 
at which the policy was last observed; and the column headed “Event” is coded 
“s” for surrender, “d” for death, and “e” for expiration of the five-year period. 
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Table 10.4 Data Set D1 


a ; 
Policyholder Time of death Time of surrender 
SS a nn 
1 - 0.1 
2 4.8 0.5 
- 0.8 
3 
4 0.8 3.9 
5 3.1 1.8 
6 - 1.8 
7 7 2.1 
8 E 2.5 
9 - 2.8 
10 2.9 4.6 
11 2.9 4.6 
12 2t 3.9 
13 4.0 - 
14 -= 4.0 
15 7 4.1 
16 4.8 — 
17 f E 4.8 
18 = 4.8 
19-30 w at 


When observations are collected from a probability distribution, the ideal 
situation is to have the (essentially) exact! value of each observation. This 
case is referred to as “complete, individual data.” This is the case in Data 
Sets B and D1. There are two reasons why exact data may not be available. 
One is grouping, in which all that is recorded is the range of values in which 
the observation belongs. This is the case for Data Set C and for Data Set A 

ith five or more accidents. 

sa eee reason that exact values may not be available is the presence of 
censoring or truncation. When data are censored from below, observations 
below a given value are known to be below that value but the exact value is 
unknown. When data are censored from above, observations above a given 
value are known to be above that value but the exact value is unknown. Note 
that censoring effectively creates grouped data. When the data are ce 
in the first place, censoring has no effect. For example, the data in Data Set 

may have been censored from above at 300,000, but we cannot know for sure 


1Some measurements are never exact. Ages may be rounded to the nearest whole T 
monetary amounts to the nearest dollar, car mileage to the nearest tenth of a mile, an a 
on. This text is not concerned with such rounding errors. Rounded values will be treate 


as if they are exact. 
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Table 10.5 Data Set D2 


First Last 
observed observed 


First 
observed 


Last 
observed 


Event 


Event 


1 0 0.1 S 7 
2 0 0.5 s 0 4.8 A 
3 0 0.8 S 0 4.8 S 
4 0 0.8 d 19-30 0 5.0 e 
5 0 1.8 s 0.3 5.0 e 
6 0 1.8 s 0.7 5.0 è 
i i 2.1 s 1.0 41 d 
8 0 2.5 s 1.8 3.1 d 
9 0 2.8 s 2.1 3.9 3 
10 0 2.9 d 2.9 5.0 e 
11 0 2.9 d 2.9 48 s 
12 0 3.9 s 3.2 4.0 d 
14 0 4.0 s 3.9 5.0 e 
15 0 4.1 s 


from the data set and that knowledge has no effect on how we treat the data. 
On the other hand, were Data Set B to be censored at 1,000, we would have 
15 individual observations and then 5 grouped observations in the interval 
from 1,000 to infinity. 

In insurance settings, censoring from above is fairly common. For example, 
if a policy pays no more than 100,000 for an accident any time the loss is 
above 100,000 the actual amount will be unknown but we will know that it 
happened. In Data Set D2 we have random censoring. Consider the fifth 
policy in the table. When the “other information” is not available, all that 
is known about the time of death is that it will be after 1.8 years. All of the 
policies are censored at 5 years by the nature of the policy itself. Also, note 
that Data Set A has been censored from above at 5. This is more common 
language than to say that Data Set A has some individual data and some 
grouped data. 

When data are truncated from below, observations below a given value 
are not recorded. Truncation from above implies that observations above a 
given value are not recorded. In insurance settings, truncation from below 
is fairly common. If an automobile physical damage policy has a per claim 
deductible of 250, any losses below 250 will not come to the attention of the 
insurance company and so will not appear in any data sets. Data Set D2 
has observations 31-40 truncated from below at varying values. Data sets 
may have truncation forced on them. For example, if Data Set B were to be 
truncated from below at 250, the first 7 observations would disappear and the 
remaining 13 would be unchanged. 
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10.2 THE EMPIRICAL DISTRIBUTION FOR COMPLETE, 
INDIVIDUAL DATA 


As noted in Definition 10.3, the empirical distribution assigns probability 1/ n 
to each data point. That works well when the value of each data point 1s 
recorded. An alternative definition follows: 


Definition 10.5 The empirical distribution function is 


number of observations <= 
_, NUMDET 0J ODICTA 
Fn (2) = n , 


where n is the total number of observations. 


i iri lity functions for the data in 

le 10.6 Provide the empirical probability func E e data 
Da. Sets A and B. For Data Set A also provide the empirical distribution 
function. For Data Set A assume all seven drivers who had five or more 


accidents had exactly five accidents. 


For notation, a subscript of the sample size (or of n if the sample ue is ue 
known) will be used to indicate an empirical function. Without the su pe. 
the function represents the true function for the underlying random variable. 
For Data Set A, the estimated probability function 1s 


81,714/94,935 = 0.860736, «=0, 
11,306/94,935 = 0.119092, «=1, 
1,618/94,935 = 0.017043,  s=2, 
Pouo3s(t) = $ 959 /94,935 = 0.002633, = =3, 
40 /94,935 = 0.000421, a =A, 
7/94,935 = 0.000074, =5, 


where the values add to 0.999999 due to rounding. The distribution function 
is a step function with jumps at each data point. 


0/94,935 = 0.000000, z <0, 
81,714/94,935 = 0.860736, 0<a2<1, 
93,020/94,935 = 0.979828, ; = T 5 7 
„(x)= < 94,638/94,935 = 0.996872, 2 <T < 9, 
DERA a = 0.999505, 3< z< 4, 
94,928 /94,935 = 0.999926, 4< T< 5, 

94,935/94,935 = 1.000000, = >5. 


For Data Set B, 


0.05, z= 27, 
0.05, x= 82, 


i ; = 115, 
pæ(z) = 0 “i T 


0.05, z= 15,743. 
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As noted in the example, the empirical model is a discrete distribution. 
Therefore, the derivative required to create the density and hazard rate func- 
tions cannot be taken. The closest we can come, in an empirical sense, to 


estimating the hazard rate function is to estimate the cumulative hazard rate 
function, defined as follows. 


Definition 10.7 The cumulative hazard rate function is defined as 
H(z) = -ln S(x). 


The name comes from the fact that, if S(x) is differentiable, 


nma Sa) FO) — 
H(z) = -— Sa) ~ S@) > (x) 


and then 


H(z) = J i h(y)dy. 


-00 


The distribution function can be obtained from F(x) = 1—S(z) = 1—e7#0), 
Therefore, estimating the cumulative hazard function provides an alternative 
way to estimate the distribution function. 

In order to define empirical estimates, some additional notation is needed. 
For a sample of size n, let yi < yo <--- < Yk be the k unique values that 
appear in the sample, where k must be less than or equal to n. Let s; be the 
number of times the observation y; appears in the sample. Thus, DE 5j =N. 
Also of interest is the number of observations in the data set that are greater 
than or equal to a given value. Both the observations and the number of 
observations are referred to as the risk set. Let rj = si be the number 


of observations greater than or equal to yj. Using this notation, the empirical 
distribution function is 


0, T< Y, 
F(x) = 1-4, Yj-1 SB < Yj, J=2,.-.,k, 
1, T > Yk- 


Example 10.8 Consider a data set containing the numbers 1.0, 1.3, 1.5, 1.5, 
2.1, 2.1, 2.1, and 2.8. Determine the quantities described in the previous 
paragraph and then obtain the empirical distribution function. 


There are five unique values and thus k = 5. Values of yj, sj, and rj are 
given in Table 10.6. 
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Table 10.6 Values for Example 10.8 


3 Yj Sj Tj 
C 
1 1.0 1 8 
2 1.3 1 T 
3 1.5 2 6 
4 2.1 3 4 
5 2.8 1 1 
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Table 10.7 Data for Example 10.11 


F(z) = 


0, z < 1.0, 


1—1 =0.125, 10<2< 1.3, 


8 


= 0.250, 1.3 <z < 1.5, 


6 

8 
1-4=0500, 15<2< 2.1, 
$ 2.1 <T < 28, 


1, z > 2.8. O 


Definition 10.9 The N elson—Aalen estimate (([1],[99]) of the cumulative 
hazard rate function is 


Because this is a step function, 
mate of the hazard rate function) 


A(z) = 


0, 


T < Yi 


k : 
isi aa D> Yk- 


of this estimator can be found on page 302. 


Example 10.10 Determine the Nelso 


ample. 
We have 


0, 
2 = 0.125, 
0.125 + 4 = 0.268, 


0.601 + Å = 1.351, 
1.351 + 7 = 2.351, 


gij #, Yj-ı LT < Yj» 9D E Bey 


its derivatives (which would provide an esti- 
are not interesting. An intuitive derivation 


n—Aalen estimate for the previous eT- 


x < 1.0, 

1.0 < g< 1.3, 
1.3 <x < 1.5, 
1.5 <z < 2.1, 
21 <zr<28, 


g > 2.8. 


j Yj sj Tj  Sso(2) H(z) (x) = eA) 
1 08 1 30 2=0.9667 $ = 0.0333 0.9672 
2 29 2 29 2% =0.9000 0.0333+ 4 = 0.1023 0.9028 
3 31 1 27 2% =0.8667 0.1023+ z =0.1393 0.8700 
4 40 1 26 2 =0.8333 0.1393-+ 5 = 0.1778 0.8371 
5 48 2 25 2%=0.7667 0.1778+ $ =0.2578 0.7727 


These values can be used to produce an alternative.estimate of the distribution 
function via exponentiation. For example, for 1.5 < x < 2.1, the estimate is 
F(a) = 1 — e7060 = 0.452, which is not equal to the empirical estimate of 
0.5. When a function is estimated by other methods, the function has a caret 
(hat) placed on it. m 


Example 10.11 Determine the empirical survival function and Nelson-Åalen 
estimate of the cumulative hazard rate function for the time to death for Data 
Set D1. Estimate the survival function from the Nelson-Åalen estimate. As- 
sume that the death time is known for those who surrender. 


The calculations are in Table 10.7. For the empirical functions, the values 
given are for the interval from (and including) the current y value to (but not 
including) the next y value. , Oo 


. For this particular problem, where it is known that all policyholders termi- 
nate at time 5, results past 5 are not interesting. The methods of obtaining 
an empirical distribution that have been introduced work only when the indi- 
vidual observations are available and there is no truncation or censoring. The 
following chapter will introduce modifications for those situations. 


10.2.1 Exercises 


10.1 Obtain the empirical distribution function and the Nelson—Aalen esti- 
mate of the distribution function for the time to surrender using Data Set D1. 
Assume that the surrender time is known for those who die. 


10.2 The data in Table 10.8 are from Loss Distributions [59], p. 128. It 
represents the total damage done by 35 hurricanes between the years 1949 
and 1980. The losses have been adjusted for inflation (using the Residential 
Construction Index) to be in 1981 dollars. The entries represent all hurricanes 
for which the trended loss was in excess of 5,000,000. 


le lk eE_:——t—i—~™*s 
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Table 10.8 ‘Trended hurricane losses 
Loss (10°) 


40,596 1975 192,013- 
1949 41,409 1972 198,446 

1971 10,562 1959 47,905 1964 227,338 
1956 14,474 1950 49,397 1960 329,511 
1961 15,351 1954 52,600 1961 361,200 
1966 16,983 1973 59,917 1969 421,680 
1955 18,383 1980 63,123 1954 513,586 
1958 19,030 1964 77,809 1954 545,778 
1974 25,304 1955 102,942 1970 750,389 
1959 29,112 1967 103,217 1979 863,881 


1971 30,146 
1976 33,727 


123,680 
140,136 


1965 1,638,000 


The federal government is considering funding a program that would pro- 
vide 100% payment for all damages for any hurricane causing damage in excess 
of 5,000,000. You have been asked to make some preliminary estimates. 


(a) Estimate the mean, standard deviation, coefficient of variation, and 
skewness for the population of hurricane losses. 


(b) Estimate the first and second limited moments at 500,000,000. 


10.3 (*) There have been 30 claims recorded in a random sampling of claims. 
There were 2 claims for 2,000, 6 for 4,000, 12 for 6,000, and 10 for 8,000. 
Determine the empirical skewness coefficient. 


10.3 EMPIRICAL DISTRIBUTIONS FOR GROUPED DATA 


For grouped data, construction of the empirical distribution as defined pre- 
viously is not possible. However, it is possible to approximate the empirical 
distribution. The strategy is to obtain values of the empirical distribution 
function wherever possible and then connect those values in some reasonable 
way. For grouped data, the distribution function is usually approximated 
by connecting the points with straight lines. Other interpolation methods 
are discussed in Chapter 15. For notation, let the group boundaries be 


Co < Cy Lt < Ck, where often co = 0 and cy = œ. The number of ob- 
servations falling between cj—1 and cy is denoted n; with ae nj =n. For 


such data, we are able to determine the empirical distribution at each group 
boundary. That is, Fr(cj) = (1 /n) i, ni Note that no rule is proposed 
for observations that fall on a group boundary. There is no correct approach, 
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but whatever approach is used, consistency in assignment of observations to 
groups should be used. You will note that in Data Set C it is not possible to 


tell how the assignments were made. If we had that knowledge, it would not 
affect any subsequent calculations. 


Definition 10.12 For grouped data, the distribution function obtained by 
connecting the values of the empirical distribution function at the group bound- 
aries with straight lines is called the ogive. The formula is 


Cj— T T — Cj 
(ya ee ee Ee haat Ds Y C ‘ 
n( ) Cj — Cj n(Cj wt aa Cj-1 STL. 

This function is differentiable at all values except group boundaries. There- 
fore the density function can be obtained. To conipletely specify the density 
function, it will arbitrarily be made right-continuous. 


Definition 10.13 For grouped data, the empirical density function can be 
obtained by differentiating the ogive. The resulting function is called a his- 
togram. The formula is 


fa{z) = Fr(oj) — Fr(ej-1) _ nj 


Cj — Cj- n(cj — cj—1) 


Cj—1 ST < Cj. 


Many computer programs that produce histograms actually create a bar 
chart with bar heights proportional to nj/n. This is acceptable if the groups 
have equal width, but if not, then the above formula is needed. The ad- 
vantage of this approach is that the histogram is indeed a density function, 
and among other things, areas under the histogram can be used to obtain 
empirical probabilities. 


Example 10.14 Construct the ogive and histogram for Data Set C. 
The distribution function is 


0.000058150z, 0<2< 7,500, 

0.29736 + 0.0000185022, 7,500 < z < 17,500, 

0.47210 + 0.000008517z, 17,500 < x < 32,500, 
Foo7(x) = 4 0.63436 + 0.0000035242, 32,500 < x < 67,500, 

0.78433 + 0.0000013022, 67,500 < x < 125,000, 

0.91882 + 0.000000227z, 125,000 < x < 300,000, 

undefined, x > 300,000, 


where, for example, for the range 32,500 < x < 67,500 the calculation is 


67,500—z 170 xz — 32,500 198 


Foor (2) = eee oe + SE 
arle) = &7 590 — 32,500 227 ` 67,500 — 32,500 227° 


The value is undefined above 300,000 because the last interval has a width of 
infinity. A graph of the ogive for values up to 125,000 appears in Figure 10.1. 
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0 25,000 50,000 


75,000 100,000 425,000 


Fig. 10.1 Ogive for general liability losses. 


0.00006 


0.00005 


0.00004 


= 0.00003 


0.00002 
0.00001 
0 + + 7 prea 
0 25,000 50,000 75,000 400,000 125,000 
x 


Fig. 10.2 Histogram of general liability losses. 


The derivative is simply a step function with the following values. 


0.000058150, 
0.000018502, 
0.000008517, 
farls) = 0.000003524, 
0.000001302, 
. 0.000000227, 

undefined, 


0 < z < 7,500, 

7,500 < z < 17,500, 
17,500 < z < 32,500, 
32,500 < x < 67,500, 
67,500 < x < 125,000, 
125,000 < x < 300,000, 
x > 300,000. 


A graph of the function up to 125,000 appears in Figure 10.2. 
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Table 10.9 Data for Exercise 10.4 


Payment range Number of payments 


0-25 6 
25-50 24 
50-75 30 
75-100 31 
100-150 57 
150-250 80 
250-500 85 
500-1000 54 
1000-2000 15 
2000-4000 10 
Over 4000 0 S 


10.3.1 Exercises 
10.4 Construct the ogive and histogram for the data in Table 10.9. 


10.5 (*) The following 20 wind losses (in millions of dollars) were recorded 
in one year: 

111 1 1 2 2 3 3o 
6 6 8 10 13 14 15 18 22 25 


(a) Construct an ogive based on using class boundaries at 0.5, 2.5, 8.5, 
15.5, and 29.5. 


(b) Construct a histogram using the same boundaries as in part (a). 


10.6 The data in Table 10.10 are from Herzog and Laverty [54]. A certain 
class of 15-year mortgages was followed from issue until December 31, 1993. 
The issues were split into those that were refinances of existing mortgages 
and those that were original issues. Each entry in the table provides the 
number of issues and the percentage of them that were still in effect after the 
indicated number of years. Draw as much of the two ogives (on the same 
graph) as is possible from the data. Does it appear from the ogives that the 
lifetime variable (time to mortgage termination) has a different distribution 
for refinanced versus original issues? 


10.7 (*) The data in Table 10.11 were collected (units are millions of dollars). 
Construct the histogram. 


10.8 (*) Forty losses have been observed. Sixteen are between 1 and $ and 
those 16 losses total 20. Ten losses are between 4 and 2 with a total of 15. 
Ten more are between 2 and 4 with a total of 35. The remaining 4 losses 
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Table 10.10 Data for Exercise 10.6 


Refinances Original + 
Years No. issued Survived No. issued Survive 
TEOTEO oa a ae a ee 
99.97 12,813 99.88 
a Ea 99.82 18,787 99.43 
3.5 1,550 99.03 22,513 ae 
4.5 1,256 98.41 21,420 oe 
5.5 1,619 97.78 26,790 : 


en Sa 


Table 10.11 Data for Exercise 10.7 


Se nescence en 
Loss No. of observations 
eae 
0-2 25 
2-10 10 
10-100 10 
100—1,000 5 
rh Wats Be Sk a a 


are greater than 4. Using the empirical model based on these observations, 
determine E(X A 2). 


= le of size 2,000 contains 1,700 observations that are no greater 
ne bs 30 that are greater than 6,000 but no greater than 7,000, and 2 
that are greater than 7,000. The total amount of the 30 PET, ae 
are between 6,000 and 7,000 is 200,000. The value of E(X A 6,000) for the 
empirical distribution associated with these observations is 1, 810. Determine 
E(X A 7,000) for the empirical distribution. 


11 


modified data 


11.1 POINT ESTIMATION 


It is not unusual for data to be incomplete due to censoring or truncation. 
The formal definitions are as follows. 


Definition 11.1 An observation is truncated from below (also called left 
truncated) at d if when it is below d it is not recorded but when it is above d 
it is recorded at its observed value. 

An, observation is truncated from above (also called right truncated) at 
u if when it is above u it is not recorded but when it is below, a it is recorded 
at its observed value. 

An observation is censored from below (also called left sod at d if, 
when it is below d it is recorded as being equal to d but when it is above d it 
is recorded at its observed value. 

An observation is censored from above (also called right censored) at u 
if, when it is above u it is recorded as being equal to u but when it is below u 
it is recorded at its observed value. 


The most common occurrences are left truncation and right censoring. Left 
truncation occurs when an ordinary deductible of d is applied. When a poli- 
cyholder has a loss below d, he or she knows no benefits will be paid and so 


Loss Models: From Data to Decisions, Second Edition. 
By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
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does not inform the insurer. When the loss is above d, the amount of the loss 
will be reported. A policy limit is an example of right censoring. When the 
amount of the loss exceeds u, benefits beyond that value are not paid and so 
the exact value is not recorded. However, it is known that a loss of at least u 
has occurred. 

When constructing a mortality table, it is impractical to follow people from 
birth to death. It is more common to follow a group of people of varying ages 
for a few years. When a person joins a study, he or she is alive at that time. 
This person’s age at death must be at least as great as the age at entry to the 
study and thus has been left truncated. If the person is alive when the study 
ends, right censoring has occurred. The person’s age at death is not known, 
but it is known that it is at least as large as the age when the study ended. 
Right censoring also affects those who leave the study prior to its end due to 
surrender. 

Because left truncation and right censoring are the most common occur- 
rences in actuarial work, they are the only cases that will be covered in this 
chapter. To save words, truncated will always mean truncated from below 
and censored will always mean censored from above. 

When trying to construct an empirical distribution from truncated or cen- 
sored data, the first task is to create notation to summarize the data. For 
individual data, there are three facts that are needed. First is the truncation 
point for that observation. Let that value be d; for the jth observation. if 
there was no truncation, dj; = 0. The second is the observation itself. The 
notation used will depend on whether or not that observation was censored. If 
it was not censored, let its value be zz. If it was censored, let its value be uj. 
When this subject is presented more formally, a distinction is made between 
the case where the censoring point is known in advance and where it is not. 
For example, a liability insurance policy with a policy limit usually has the 
censoring point known prior to the receipt of any claims. On the other hand, 
in a mortality study of insured lives, those that surrender their policy do so 
at an age that was not known when the policy was sold. In this chapter no 
distinction will be made between the two cases. 

To perform the estimation, the raw data must be summarized in a useful 
manner. The most interesting values are the uncensored observations. Let 
yı < y2 <- - < yp be the k unique values of the z;s that appear in the sample, 
where k must be less than or equal to the number of uncensored observations. 
Let s; be the number of times the uncensored observation yj appears in the 
sample. The final important quantity is the risk set at the jth ordered 
observation yj and is denoted rj. When thinking in terms of a mortality 
study, the risk set comprises the individuals who are under observation at 

that age. Included are all who die (have x values) at that age or later and all 
who are censored (have u values) at that age or later. However, those who are 
first observed (have d values) at that age or later were not under observation 
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at that time. The formula is 


rj = (number of zis > y;) + (number of ujs > yj) — (number of djs > y;). 


Alternatively, because the total numb i 
er of dis is equal to the tot 
Tis and u;s, we also have ' j PANTES 


rj = (number of djs < y;) — (number of zis < yj) — (number of u;s < y;). 
This latter version is a bit easier to conceptualize because it includes te 
have entered the study prior to the given age less those who have already left 
The key point is that the risk set is the number of people observed alive at 
age y;. If the data are loss amounts, the risk set is the number of policies with 
observed loss amounts (either the actual amount or the maximum amount due 
: a policy limit) greater than or equal to y; less those with deductibles greater 
than or equal to y;. This also leads to a recursive version of the formula, 


Tj = 1;-1 + (number of djs between y;_1 and y;) 
—(number of z;s between y;_; and y;) 
—(number of u;s between y;_1 and yj), (11.2) 


where between is interpreted to mean 
greater than or equal to y;— 
than y; and ro is set equal to 0. nent 


The calculations appear in Tables 11.1 and 11.2. O 


Despite all the work we have done to this point, we h 

an estimator of the survival function. The aaa ee ae ate 
the Kaplan-Meier product-—limit Estimator [70]. Begin with $(0) = 1 
Because no one died prior to yı, the survival function remains at 1 until ‘this 
value. Thinking conditionally, just before y1, there were rı people available 
to die, of which sı did so. Thus, the probability of surviving past is 
(r1—81)/r1. This becomes the value of $(y,) and the survival function ana 
at that value until y2. Again, thinking conditionally, the new survival value 
at y2 is S(y1)(re — s2)/r2. The general formula is 


1, 0 < t< Yi, 
m (= — Si 
CTO S 


i 
k of Pps si 
A ( > or 0, t> yp. 


yj-1 St < yj, j = 2,...,k, 
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i di 
1 0 
2 0 
3 0 
4 0 
5 0 
6 0 
7 0 
8 0 
9- 0 
10 0 
11 0 
12 0 
13 0 
14 0 
15 0 
Table 11.2 Risk set calculations for Example 11.2 
J Yj Sj Tj 
1 0.8 1 32 — 0 — 2 = 30 or 0 + 32 — 0 — 2 = 30 
2 2.9 2 35 — 1 — 8 = 26 or 30 +3 — 1 — 6 = 26: 
3 3.1 1 37 — 3 — 8 = 26 or 26 + 2 — 2 — 0 = 26 
4 4.0 2 40 — 4 — 10 = 26 or 26 + 3 — 1 — 2 = 26 
5 4.1 1 40 — 6 — 11 = 23 or 26 +0 — 2 — 1 = 23 
6 4.8 1 40 — 7 — 12 = 21 or 23+0—1—1=21 


If s} = rg, then S(t) = 0 for t > yp makes sense. Everyone in the sample has 
died by that value and so, empirically, survival past that age is not possible. 
However, due to censoring, it is possible that at the age of the last death there 
were still people alive but all were censored prior to death. We know that 
survival past the last observed death age is possible, but there is no empirical 
data available to complete the survival function. One option (the first one used 
in the above formula) is to keep the function at its last value. This is clearly 
the largest reasonable choice. Another option is to declare the function to be 
zero past the last observed age, whether it is an observed death or a censored 
age. This is the smallest reasonable choice and makes it possible to calculate 
moments. An intermediate option is to use an exponential curve to reduce 
the value from its current level to zero. Let w = max{z1, ... , En; Ul,- - , Un }- 
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Then, for t > w, 


k 
Sn (t) = e(t/w) in s” 2a (e, where s* = JI (= = 2) f 
A Ti 
i=l 
There is an alternative method of obtaining the values of sj and r; that is 
` more suitable for Excel® spreadsheet work.! The steps are as follows: 


1. There should be one row for each data point. The points need not be 
in any particular order. 


bo 


. Each row should have three entries. The first entry should be dj, the 
second entry should be u; or z; (there can be only one of these). The 
third entry should be the letter “x” (without the quotes) if the second 
entry is an x-value (an observed value) and should be the letter “u” if 
the second entry is a u-value (a censored value). Assume, for example, 
that the djs occupy cells B6:B45, the ujs and x;s occupy C6:C45, and 
the x/u letters occupy D6:D45. 


oo 


. Create the columns for x, r, and s as follows. 


e 


For the ordered observed values (say they should begin in cell F2, start 
with the lowest d-value. The formula in F2 is =MIN (B6:B45). 


an 


. Then in cell F3 enter the formula 
=MIN(IF(C86:0$45>F2,IF (D$6:D$45= “x” ,C$6:C$45,1E36) ,1E36)). 


Because this is an array formula, it must be entered with Ctrl-Shift- 
Enter. Copy this formula into cells F4, F5, and so on until the value 


1E36 appears. Column F should now contain the unique, ordered, y- 
values. 


6. In cell G3 enter the formula 


=COUNTIF (B$6:B$45, “<” &F3)-COUNTIF(C86:C$45,“<” &F3). 


Copy this formula into cells G4, G5, and so on until values appear across 


from all but the last value in column F. This column contains the risk 
set values. 


7. In cell H3 enter the formula 


=SUM(IF(C86:C$45=F3,IF (D86:D$45="x” ,1,0),0)). 


1 This scheme was devised by Charles Thayer and improved by Margie Rosenberg. It is a 
great improvement over the author’s scheme presented in earlier drafts. These instructions 
work with Office XP and should work in a similar manner for other versions. 
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Enter this array formula with Ctrl-Shift-Enter and then copy it into H4, 
H5, and so on to match the rows of column G. This column contains 
the s-values. 


8. Begin the calculation of S(y) by entering a 1 in cell I2 


9. Calculate the next S(y) value by entering the following formula in cell 


13. 
=12*(G3 — H3)/G3. 


Then copy this formula into cells I4, I5, and so on to complete the 
process. 


Example 11.3 Determine the Kaplan-Meier estimate for Data Set D2. 


Based on the previous example, we have 


1, 0<t<08, 
30-1 = 0.9667, 0.8 < t < 2.9, 
0.966728" = 0.8923, 29 <t< 3.1, 
0.892328—* = 0.8580, 3.1<t< 4.0, 
Salt) = 4 0.958025=2 = 0.7920,  40<t< 41, 
0.792078=* = 0.7576, 4.1 <t<48, 
0.7576% = 0.7215, 4.8 < t < 5.0, 


0.7215 or 0 or 0.7215¢/5-0, t> 5.0. 
o 


An alternative to the Kaplan-Meier estimate is a modification of the Nelson— 
Aalen estimate introduced earlier. As before, this method directly estimates 
the cumulative hazard rate function. The following is an intuitive derivation 
of this estimator. Let r(t) be the risk set at any time ¢ and let A(t) be the 
hazard rate function. Also, let s(t) be the expected total number of observed 
deaths prior to time t. It is reasonable to conclude that 


s(t) = f r(u)h(u)du. 
Taking derivatives, 
ds(t) = r(t)h(t)dt. 


Then, 
ds(t) 


r(t) 


= h(t)dt. 


e) 
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Integrating both sides yields 


| on = 1 h(t)dt = H(t). 


Now replace the true expected count s(t) by (t), the observed number of 
deaths by time t. It is a step function, increasing by s; at each death time 
Therefore, the left-hand side becomes l 


y Ži 

uat i ' 

which defines the estimator, H(t). The Nelson—Aalen estimator is 
0, O<t<y, 


A j—l s; A 
H(t) = ist ra yj-1 SE < yj, 7 = 2,...,h, 
Diet Hs ES Ye, 


and then 
S(t) =e" #O, 
For t > w, alternative estimates are $(t) = 0 and S(t) = §(y,)‘/”. 


p. . È j j t on 
Exam le 11 4 D ter MINE the Nelson Aalen estimate oO the SUT vival unci 


0, 0<t<08, 
$ = 0.0333, 0.8 < t< 2.9, 
0.0333 + # = 0.1103, 2.9 <t< 3.1, 


A) 0.1103 + $ = 0.1487, 3.1 < t< 4.0, 


Il 


0.1487 + 2 = 0.2256, 4.0 <t< 4.1, 
0.2256 + $ = 0.2691, 4.1 <t < 4.8, 
0.2691 + 57 = 0.3167, t> 4.8. 


ie 0 <t< 0.8, 
e70-0333 — 0,9672, 0.8 < t < 2.9, 
e70-1103 — 9.8956, 2.9 < t < 3.1, 
it = e70-1487 — 0.3618, 3.1<t < 4.0, 
e70-2256 — 0.7980, 40 <t< 4.1, 
e70-2691 — 0.7641, 4.1 < t< 48, 
e70-3167 — 0,7285, 4.8 <t < 5.0, 
0.7285 or 0 or 0.7285°/50, t> 5.0. = 
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It is important to note that when the data are truncated the resulting dis- 
tribution function is the distribution function for payments given that they 
are above the smallest truncation point (that is, the smallest d value). Empir- 
ically, there is no information about observations below that value, and thus 
there can be no information for that range. It should be noted that all the 
notation and formulas in this section are consistent with those in Section 10.2. 
If it turns out that there was no censoring or truncation, using the formulas in 
this section will lead to the same results as when using the empirical formulas 
in Section 10.2. 


11.1.1 Exercises 


11.1 Repeat Example 11.2, treating “surrender” as “death.” The easiest way 
to do this is to reverse the x and u labels and then use the above formula. In 
this case death produces censoring because those who die are lost to observa- 
tion and thus their surrender time is never observed. ‘Treat those who lasted 
the entire five years as surrenders at that time. 


41.2 Determine the Kaplan—Meier estimate for the time to surrender for Data 
Set D2. Treat those who lasted the entire five years as surrenders at that time. 


+1.3 Determine the Nelson—Aalen estimate of H(t) and S(t) for Data Set D2 
where the variable is time to surrender. 


11.4 Determine the Kaplan—Meier and Nelson—Aalen estimates of the distri- 
bution function of the amount of a workers compensation loss. First do this 
using the raw data from Data Set B. Then repeat the exercise, modifying the 
data by left truncation at 100 and right censoring at 1,000. 


11.5 (*) You are given the following times of first claim for five randomly 
selected auto insurance policies: 1, 2, 3, 4, 5. You are later told that one of 
the five times given is actually the time of policy lapse but you are not told 
which one. The smallest product-limit estimate of S(4), the probability that 
the first claim occurs after time 4, would result if which of the given times 
arose from the lapsed policy? 


11.6 (*) For a mortality study with right censored data, you are given the 
information in Table 11.3. Calculate the estimate of the survival function at 
time 12 using the Nelson—Aalen estimate. 


11.7 (*) Three hundred mice were observed at birth. An additional 20 mice 
were first observed at age 2 (days) and 30 more were first observed at age 4. 
There were 6 deaths at age 1, 10 at age 3, 10 at age 4, a at age 5, b at age 
9, and 6 at age 12. In addition, 45 mice were lost to observation at age 7, 
35 at age 10, and 15 at age 13. The following product-limit estimates were 
obtained: S'359(7) = 0.892 and S350(13) = 0.856. Determine the values of a 
and b. 
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Table 11.3 Data for Exercise 11.6 


Time Number of deaths, Number at risk, 
tj Sj Tj 

5 2 15 

7 1 12 

10 1 10 

12 2 6 


11.8 (*) Let n be the number of lives observed from birth. None were cen- 
sored and no two lives died at the same age. At the time of the ninth death 
the Nelson—Aalen estimate of the cumulative hazard rate is 0.511 and at the 


time of the tenth death it is 0.588. Estimate the value of th ‘ ; 
atthe time of thè third death: e survival function 


11.9 (*) All members of a study joined at birth; however, some may leave 
the study by means other than death. At the time of the third death, there 
was one death (that is, s3 = 1); at the time of the fourth death there ee two 
deaths; and at the time of the fifth death there was one death. The following 
product-limit estimates were obtained: S,(y3) = 0.72, Sp(ys) = 0.60, and 
Sn(ys) = 0.50. Determine the number of censored observations between fines 
ya and ys. Assume no observations were censored at the death times. 


11.2 MEANS, VARIANCES, AND INTERVAL ESTIMATION 


When all of the information is available, working with the empirical estimate 
of the survival function is straightforward. 


Example 11.5 Demonstrate that for complete data the empirical estimator 
of the survival function is unbiased and consistent. 


Recall that the empirical estimate of S(x) is S,(«) = Y/n, where Y is the 


number of observations in the sample that are greater than z. Then Y must 
have a binomial distribution with parameters n and S(x). Then, l 


Elsa) =E (Z) = =) = so), 


n 


demonstrating that the estimator is unbiased. The variance is 


Var[Si(2)] = Var (=) _ SOR- s) 


n 7 


which has a limit of zero, thus verifying consistency. m 
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In order to make use of the result, the best we can do for the variance 
is estimate it. It is unlikely we know the value of S(x) because that is the 
quantity we are trying to estimate. The estimated variance is given by 

Var[Sn(z)] = Sp(zx)[1 ae Sn(z)| i 
n 
The same results hold for empirically estimated probabilities. Let p = Pr(a < 
X <b). The empirical estimate of p is f = Sn (a) — Sn(b). Arguments similar 
to those used in the example above verify that is unbiased and consistent, 
with Var(p) = p(1 — p)/n. 

When doing mortality studies or evaluating the effect of deductibles, we 

sometimes are more interested in estimating conditional quantities. 


Example 11.6 Using the full information from the observations in Data Set 
D1, empirically estimate q2 and estimate the variance of this estimator. 


For this data set, n = 30. By duration 2, one had died, and by duration 3, 
three had died. Thus, S30(2) = $2 and 530(3) = 2T. The empirical estimate 
is 

2 S30(2) 29° 


The challenge is in trying to evaluate the mean and variance of this estimator. 
Let X be the number of deaths between durations 0 and 2 and let Y be the 
number of deaths between durations 2 and 3. Then ĝe = Y/(80—X). It 
should be clear that it is not possible to evaluate E(ĝ2) because with positive 
probability X will equal 30 and the estimator will not be defined.? The usual 
solution is to obtain a conditional estimate of the variance. That is, given 
there were 29 people alive at duration 2, determine the variance. Then the 
only random quantity is Y and we have 


Var (42|S30(2) = 23] = C 


o 


In general, let n be the initial sample, nz the number alive at age z, and 
ny the number alive at age y. Then, 


(nz — Ny) (ny) 


Var(y—«G2|Nx) ae Var(y—zDz|Nz) = 7 
T 


2 This is a situation where the Bayesian approach (introduced in Section 12.4) works better. 
Bayesian analyses proceed from the data as observed and are not concerned with other 
values that might have been observed. If X = 30, there is no estimate to worry about, 
while if X < 30, analysis can proceed. 
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Example 11.7 Using Data Set B, empirically estimate the probability that a 
payment will be at least 1,000 when there is a deductible of 250. 


Empirically, there were 13 losses above the deductible, of which 4 exceeded 
1,250 (the loss that produces a payment of 1,000). The empirical estimate 
is $. Using survival function notation, the estimator is $29(1250)/5S29(250). 
Once again, only a conditional variance can be obtained. The estimated 
variance is 4(9)/13°. Oo 


For grouped data, there is no problem if the survival function is to be 
estimated at a boundary. For interpolated values using the ogive, it is a bit 
more complex. 


Example 11.8 Determine the expected value and variance of the estimators 
of the survival function and density function using the ogive and histogram, 
respectively. 


Suppose the value of x is between the boundaries cj—ı and cj. Let Y be 
the number of observations at or below cj—ı and let Z be the number of 


observations above cj_; and at or below c;. Then 
Sie Y (c; — cj—1) + Z(z — cj—1) 
n(cj — ¢-1) 


and 
Pe n{l — S(cj-1)|(c; — eg-1) + n[S(e;~1) — Shc) (£ — cj~1) 
n(cj — ¢j-1) 


-T T — Cj-i 
+H S(¢e;) —— $ 
Cj — Ĉj—1 Cj — Cj—1 


E[Sn(z)] 


= S$(cj-1) 


This estimator is biased (although it is an unbiased estimator of the true 
interpolated value). The variance is 


(cj — cj-1)? Var(Y) + (x — cj—1)? Var(Z) 
+2(c; — cj-1)(@ — cj~1) Cov(Y, Z) 
n(c; — ej-1)? i 
where Var(Y) = nS(cj~1)[1 — S(cj-1)], Var(Z) = n[S(c;-1) — S(e,)|[1 — 


S(cj-1) + S(c;)], and Cov(Y, Z) = —n{1 — S(e;~1)|[S(c;-1) — S(c;)]. For the 
density estimate, 


Var[S;,(x)] = 


Z 


n(cj — ¢j-1) 


fr(z) = 
and 
2G) = sa) 


Cj — Cj~1 


E[fn(z)] 
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which is biased for the true density function. The variance is 


[S(cj~-1) — S(c;)][1 — S(cj-1) + S(cy)] 


Var[fn(x)] = n(cj — cj-1)? 
a 


Example 11.9 For Data Set C estimate S(10,000), f(10,000), and the vari- 
ance of your estimators. 


The point estimates are 


l 17,500 — 7 10,000 — 
Sam (10,000) = 1 — 29(27,500 — 7,500) + 42(10,000 — 7,500) 


= 0.51762 
227(17,500 — 7,500) Toia; 
42 
f227(10,000) 227(17,500 — 7,500) = 0.000018502. 
The estimated variances are , 
-~ 1 a 99 128 > 42 185 
Var[S227(10,000)] = 227(10,000)2 10,000 597 297 + 2500 337 227 
99 42 
—2(1 2 —— 
(10,000) (2,500) 597 J 
= 0.00094713 
and 49 185 
Var [fo27(10,000)] = TOG = 6.6427 x 10722. 


Discrete data such as in Data Set A can be considered a special form of 
grouped data. Each discrete possibility is similar to an interval. 


Example 11.10 Demonstrate that for a discrete random variable the empir- 
ical estimator of a particular outcome is unbiased and consistent and derive 
its variance. 


Let N; be the number of times the value zj was observed in the sam- 
ple. Then Nj has a binomial distribution with parameters n and p(x;). The 
empirical estimator is p,(#;) = N;/n and 


Bipa(e,)] = B (2) = “AED = pla), 


n 


demonstrating that it is unbiased. Also, 


Varfpa(zs)] = Var (Z4) = PEAR pees) eel- Plea), 


n? n 
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which goes to zero as n — co, demonstrating consistency. 0 


Example 11.11 For Data Set A determine the empirical estimate of p(2) 
and estimate its variance. 


The empirical estimate is 


1,618 
P94,935 (2) = 94,935 ~ 0.017043 
and its estimated variance is 
0.017043(0.982957) 


=1. 1077. 
04,935 1.76466 x 10 z 


It is possible to use the variances to construct confidence intervals for the 
unknown probability. 


Example 11.12 Use (9.3) and (9.4) to construct approximate 95% confi- 
dence intervals for p(2) using Data Set A. 


From (9.3), 
2) — p(2) , 
0.95 = Pr (s < mUa 198) 


Solve this by making the inequality an equality and then squaring both sides 
to obtain [dropping the argument of (2) for simplicity], 


(mpn _ 196, 
p(1—p) 
npa — 2nppn+ np? = 1.96°p — 1.967p’, 


0 = (n+1.967)p? — (2npn + 1.967)p + np?. 
The solution is 


_ 2npr + 1.96? + \/(2np, + 1.967)? — 4(n + 1.962) np? 

= 2(n + 1.962) 
which provides the two endpoints of the confidence interval. Inserting the 
numbers from Data Set A (pn = 0.017043, n = 94,935) produces a confidence 


interval of (0.016239, 0.017886). 
Equation (9.4) provides the confidence interval directly as 


Dn + 1.964 [Pn(l= Pa) 
n 
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Inserting the numbers from Data Set A gives 0.017043 + 0.000823 for an 
interval of (0.016220, 0.017866). The answers for the two methods are very 
similar, which is to be expected when the sample size is large. The results 
are reasonable because it is well known that the normal distribution is a good 
approximation to the binomial. O 


When data are censored or truncated, the matter becomes more complex. 
Counts no longer have the binomial distribution and therefore the distribution 
of the estimator is harder to obtain. While there are proofs available to back 
up the results presented here, they will not be provided. Instead, an attempt 
will be made to indicate why the results are reasonable. 

Consider the Kaplan-Meier product-limit estimator of S(t). It is the prod- 
uct of a number of terms of the form (r; — s;)/r;, where r; was viewed as the 
number available to die at age y; and s; is the number who actually did so. 
Assume that the death ages and the number available to die are fixed, so that 
the value of s; is the only random quantity. As a random variable, S; has 
a binomial distribution based on a sample of r; lives and success probability 
[S(y;-1) — S(y;)|/S(y;-1). The probability arises from the fact that those 
available to die were known to be alive at the previous death age. For one of 
these terms, 


rj S5\ _ 73 — TIS 5-1) — S(ys)I/S(Yi-1) _ Sys) 
2 ( Tj ) 7 Tj ~ SQ) 


That is, this ratio is an unbiased estimator of the probability of surviving 
from one death age to the next one. Furthermore, 


S(yj-1) — S(y;) 1 — est) 
Var =) 2 Suma) i S(yj-1) 
™ [S(y;-1) — S(y;)lS (ys) 
75 S(yj—1)? 


Now consider the estimated survival probability at one of the death ages. Its 


expected value is 
sli Ti — Si -JJe ri — Si 
i=1 vi 7 i=l Ti 


= S(y;) 
7 Nee 1) ~ S(yo)’ 


E[S(y,)] 


where yo is the smallest: observed age in the sample. In order to bring the 
expectation inside the product, it was assumed that the S values are inde- 
pendent. The result demonstrates that at the death ages the estimator is 
unbiased. 
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With regard to the variance, we first need a general result concerning the 
variance of a product of independent random variables. Let X1,..., Xn be 
independent random variables where E(X;) = u; and Var(X;) = oj. Then, 


Var(X1-+-Xn) B(X?---X2) —EB(X.--- Xn)? 
E(X?) ---E(Xq) — E(%)?--- E(Xn)? 


(ui +03) (un + on) = HI Hn 


Il 


For the product-limit estimator, 


AEs 


71 Sy)? , [Slyi-1) — SUSU: _ Ss)? 
I [sq riS(yi-1)? | S(yo)? 


O É [ras 0)? + Sly) — SCH) Sl) _ Sy)? 
7 [r| riS (yi-1)? | S(yo)? 


AT Sw)? raS) + [S(yi-1) — Sys] _ Sys)? 
g lley riS (yi) | S(yo)? 


S(y;)? 1 S(yi-1) — S(yi) | 
S(yo)? fil i+ riS (yif | i} 


i=1 


ll 


Var[Sn(y;)] 


This formula is unpleasant to work with, so the following approximation is 
often used. It is based on the fact that for any set of small numbers aj,..., ân 
the product (1+4a1)---(1-+a,) is approximately 1 +a1 +-+ -+an. This follows 


-because the missing terms are all products of two or more of the a;s. If they 


are all small to begin with, the products will be even smaller and so can be 
ignored. Applying this produces the approximation 


(vs) ]? 4 Sis) — Stu) 
Valsan = Eo | 3 nS 


Because it is unlikely that the survival function is known, an estimated value 
needs to be inserted. Recall that the estimated value of S(y;) is actually 
conditional on being alive at age yo. Also, (r; — s;)/ri is an estimate of 
5(yi)/S(yi-1)- Then, 


j 
s: 
= ee 11.3 
Var[Sn (yj = Saly) > ri(ri ae si) ( ) 
Equation (11.3) is known as Greenwood’s approximation. It is the only version 
that will be used in this text. 
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Example 11.13 Using Data Set D1, estimate the variance of S39(3) both 
directly and using Greenwood’s formula. Do the same for 243. 


Because there is no censoring or truncation, the empirical formula can be 
used to directly estimate this variance. There were three deaths out of 30 
individuals, and therefore 


(3/30)(27/30) 81 
30 ~ 303" 


For Greenwood’s approximation, rı = 30, sı = 1, r2 = 29, and so = 2. The 


approximation is 
27\°/ 1 yao? jil 
30 30(29)  29(27)/ ~ 303° 


It can be demonstrated that when there is no censoring or truncation the two 
formulas will always produce the same answer. Recall that the development 
of Greenwood’s formula produced the variance only at death ages. The con- 
vention for non-death ages is to take the sum up to the last death age that is 
less than or equal to the age under consideration. 

With regard to 243, arguing as in Example 11.6 produces an estimated 
{conditional) variance of 


Var[S30(3)] = 


_ (4/27)(23/27) _ 92 


Var(ogs) 27 ia 273 . 


For Greenwood’s formula, we first must note that we are estimating 


_ 5(8)- (5) _, _ 356) 


ae S(3) SSB)" 


As with the empirical estimate, all calculations must be done given the 27 
people alive at duration 3. Furthermore, the variance of 23 is the same as the 
variance of (5) using only information from duration 3 and beyond. Starting 
from duration 3 there are three death times, 3.1, 4.0, and 4.8, with rı = 27, 
T2 = 26, r3 = 25, sı = 1, s2 = 1, and s3 = 2. Greenwood’s approximation is 


Byr at 4_2_\_ 
27 27(26)  26(25) © 25(23)) — 273 = 
Example 11.14 Repeat the previous example, this time using all 40 obser- 


vations in Data Set D2 and the incomplete information due to censoring and 
truncation. : 


For this example, the direct empirical approach is not available. That is 
because it is unclear what the sample size is (it varies over time as subjects 
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enter and leave due to truncation and censoring). From Example 11.2, the 
relevant values within the first three years are rı = 30, ro = 26, sı 1, and 
s2 = 2. From Example 11.3, 549(3) = 0.8923. Then, Greenwood’s estimate is 


1 2 
23)? | ——~ + z } = 0.0034671. 
(0:5023) (a ğ za) 
An approximate 95% confidence interval can be constructed using the normal 
approximation. It is 


0.8923 + 1.96 v 0.0034671 = 0.8923 + 0.1154, 


which corresponds to the interval (0.7769, 1.0077). For small sample sizes, it 
is possible that the confidence intervals admit values less than 0 or greater 
than 1. 

With regard to 243, the relevant quantities are (starting at duration 3, but 
using the subscripts from the earlier examples for these data) r3 = 26, r4 = 26, 
Ts = 23, rg = 21, s3 = 1, s4 = 2, 85 = 1, and sg = 1. This gives an estimated 
variance of 


0.7215\7 / 1 2 1 1 ) 
aaa ge Bo Soha aA 
(Tas) (a t eD BOA C0) 


The previous example indicated that the usual method of constructing a 
confidence interval can lead to an unacceptable result. An alternative ap- 
proach can be constructed as follows. Let Y = In[—InS,(¢)]. Using the 
delta method (see Theorem 12.17), the variance of Y can be approximated as 
follows. The function of interest is g(x) = ln(— ln z). Its derivative is 


r = 1 —i _ 1 
g(x) = -lng r  zha 


According to the delta method, the variance of Y can be approximated by 


Var[Sn(t)] 


ESO Varla = OE 


where we have used the fact that S» (t) is an unbiased estimator of S(t). Then, 
an estimated 95% confidence interval for 9 = In[— ln S(t)] is 


Var[Sn(t)] 


In[—In $n (t)] E L962 Te Tay: 


Because S(t) = exp(—e®), putting each endpoint through this formula will 
provide a confidence interval for S(t). For the upper limit we have (where 
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ô = Var[S,.(t)}) 
exp {- elal—1n Sn ()]+1.96V5/[S, (t) In Sn (t)] \ 


=i {in Sn (t)]e1-26V8/157 (8) In Sn (4)] } 


1.965 

= Sa (©), U = =]. 
n(t)”, U = exp E a 

Similarly, the lower limit is S,(¢)/4. This interval will always be inside the 

range zero to 1 and is referred to as the log-transformed confidence interval. 


Example 11.15 Obtain the log-transformed confidence interval for S(3) as 
in Example 11.14. 


We have i 
1.96-/0.0034671 


The lower limit of the interval is 0.8923!/°-32142 — 0.70150 and the upper limit 
is 0.8923°9-32142 — 9.96404. a 


Similar results are available for the Nelson—Aalen estimator. An intuitive 
derivation of a variance estimate proceeds as follows. As in the derivation 
for the Kaplan-Meier estimator, all results are obtained assuming the risk 
set numbers are known, not random. The number of deaths at death time t; 
‘has approximately a Poisson distribution? with parameter r;h(t;) and so its 
variance is r;h(t;), which can be approximated by r;(s;/r;) = s;. Then (also 
assuming independence), 


j i F j 
ap s << si Var Si). Si 
rat =e (So) = ed. Som 
The linear confidence interval is simply 


A(t) + 2a/2\/ VarlA(y,)]. 


A log-transformed interval similar to the one developed for the survival func- 
tion? is 
Za/2\/ Var[H(y;)] 


HA(t)U, where U =e + = 
(t) xp Aw 


3A binomial assumption (as was used for the Kaplan-Meier derivation) could also have 
been made. Similarly, a Poisson assumption could have been used for the Kaplan-Meier 
derivation. The formulas given here are the ones most commonly used. 

4The derivation of this interval uses the transformation Y = In H (é). 


MEANS, VARIANCES, AND INTERVAL ESTIMATION 315 


Example 11.16 Construct an approzimate 95% confidence interval for H(3) 
by each formula using all 40 observations in Data Set D2. 


The point estimate is H(3) = ag + a = 0.11026. The estimated variance 
is sz + z = 0.0040697. The linear confidence interval is 


0.11026 + 1.96 v 0.0040697 = 0.11026 + 0.12504 
for an interval of (—0.01478, 0.23530). For the log-transformed interval, 


1.96(0.0040697)1/2 


U = exp |+— oats 


= exp(+1.13402) = 0.32174 to 3.10813. 


The interval is 0.11026(0.32174) = 0.03548 to 0.11026(3.10813) = 0.34270. O 


11.2.1 Exercises 


11.10 Using the full information from Data Set D1, empirically estimate gj 
for j = 0,...,4 and spp where the variable of interest is time of surrender. 
Estimate the variance of each of your estimators. Identify which estimated 
variances are conditional. Interpret 5qq as the probability of surrendering 
before the five years expire. 


11.11 For Data Set A determine the empirical estimate of the probability of 
having two or more accidents and estimate its variance. 


11.12 Repeat Example 11.13 using time to surrender as the variable. 


11.13 Repeat Example 11.14 using time to surrender as the variable. Inter- 
pret 2q3 as the probability of surrendering before the five years expire. 


11.14 Obtain the log-transformed confidence interval for $(3) in Exercise 
11.13. 


11.15 Construct 95% confidence intervals for H(3) by each formula using all 
40 observations in Data Set D2 with surrender being the variable of interest. 


11.16 (*) Ten individuals were observed from birth. All were observed until 
death. Table 11.4 gives the death ages. Let Vı denote the estimated con- 
ditional variance of 347 if calculated without any distribution assumption. 
Let V2 denote the conditional variance of 37 if calculated knowing that the 
survival function is S(t) = 1 — t/15. Determine V, — Vo. 


11.17 (*) For the interval from zero to one year, the exposure (r) is 15 and 
the number of deaths (s) is 3. For the interval from one to two years the 
exposure is 80 and the number of deaths is 24. For two to three years the 
values are 25 and 5; for three to four years they are 60 and 6; and for four to 
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Table 11.4 Data for Exercise 11.16 
Age Number of deaths 


mm 
© 
Se Oe OH Re eR 


five years they are 10 and 3. Determine Greenwood’s approximation to the 
variance of 5(4). 


11.18 (*) Observations can be censored, but there is no truncation. Let yj 
and yj+1 be consecutive death ages. A 95% linear confidence interval for 
H(y;) using the Nelson—Aalen estimator is (0.07125, 0.22875) while a similar 
interval for H(y;+1) is (0.15607, 0.38635). Determine s;+1. 


. 11.19 (*) A mortality study is conducted on 50 lives, all observed from age 

0. At age 15 there were two deaths; at age 17 there were three censored ob- 
servations; at age 25 there were four deaths; at age 30 there were c censored 
observations; at age 32 there were eight deaths; and at age 40 there were two 
deaths. Let S be the product-limit estimate of $(35) and let V be the Green- 
wood estimate of this estimator’s variance. You are given V/S? = 0.011467. 
Determine the value of c. 


11.20 (*) Fifteen cancer patients were observed from the time of diagnosis 
until the earlier of death or 36 months from diagnosis. Deaths occurred as 
follows: At 15 months there were /two2 deaths; at 20 months there were three 
deaths; at 24 months there were two deaths; at 30 months there were d deaths; 
at 34 months there were two deaths; and at 36 months there was one death. 
The Nelson—Aalen estimate of H(35) is 1.5641. Determine the variance of this 
estimator. 


11.21 (*) You are given the values in Table 11.5. Determine the standard 
deviation of the Nelson—Aalen estimator of the cumulative hazard function at 
time 20. 


11.3 KERNEL DENSITY MODELS 


One problem with empirical distributions is that they are always discrete. If 
it is known that the true distribution is continuous, the empirical distribution 


KERNEL DENSITY MODELS 317 


Table 11.5 Data for Exercise 11.21 


Yj Tj Sj 
aaa ĀU 
1 100 15 
8 65 20 
17 40 13 
25 31 31 


may be viewed as a poor approximation. In this section, a method of obtaining 
a smooth, empirical-like distribution is introduced. Recall from Definition 10.4 
that the idea is to replace each discrete piece of probability by a continuous 
random variable. While not necessary, it is customary that the continuous 
variable have a mean equal to the value of the point it replaces. This ensures 
that the kernel estimate has the same mean as the empirical estimate. One 
way to think about such a model is that it produces the final observed value 
in two steps. The first step is to draw a value at random from the empirical 
distribution. The second step is to draw a value at random from a continuous 
distribution whose mean is equal to the value drawn at the first step. The 
selected continuous distribution is called the kernel. 

For notation, let p(y;) be the probability assigned to the value y; (j = 
1,...,&) by the empirical distribution. Let K(x) be a distribution func- 
tion for a continuous distribution such that its mean is y. Let ky(x) be the 
corresponding density function. 


Definition 11.17 A kernel density estimator of a distribution function 
is 


k 
F(t) =) > (yj) Ky, (2) 
j= 
and the estimator of the density function is 


k 
F(t) = X p(ys)hy, (2). 


j=l 


The function ky(x) is called the kernel. Three kernels will now be intro- 
duced. 
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Definition 11.18 The uniform kernel is given by 


0, «w<y-bd, 
1 
ky(t) = ) a y-bSasyts, 
0, «>ytd, 
0, z<y—b, 
— b 
K,(z) i y—b<sr<y+b, 
1, z>ytb. 


The triangular kernel is given by 


0, z <y—b, 

zot, —b<Tr<y, 
ky(z) = 

ye y<z<y+b, 

0, z>ytbd, 

0, x < y—b, 

z—y+b) 

( We , a eet 
IgG) = (y +b- r} 

1- 4 5 ? y<r<y+b, 

2b2 


The gamma kernel is given by letting the kernel have a gamma distribution 
with shape parameter a and scale parameter y/a. That is, 


pete —2a/y 

(y/a)°T (a) ` 

Note that the gamma distribution has a mean of a(y/a) = y and a variance 
of a(y/a)? = y°/a. 


In each case there is a parameter that relates to the spread of the kernel. 
In the first two cases it is the value of b > 0, which is called the bandwidth. 
In the gamma case, the value of a controls the spread, with a larger value 
indicating a smaller spread. There are other kernels that cover the range from 
zero to infinity. 


ky(z) = 


Example 11.19 Determine the kernel density estimate for Example 10.8 us- 
ing each of the three kernels. 


ROS OMANO EAEEREN NASEER 
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Fig. 11.1 Uniform kernel density with bandwidth 0.1. 


Fig. 11.2 Uniform kernel density with bandwidth 1.0. 


The empirical distribution places probability 4 at 1.0, 4 at 1.3, 3 at 1.5, 
2 at 2.1, and at 2.8. For a uniform kernel with a bandwidth of 0.1 we do 
not get much separation. The data point at 1.0 is replaced by a horizontal 
density function running from 0.9 to 1.1 with a height of 3300) = 0.625. On 
the other hand, with a bandwidth of 1.0 that same data point is replaced 
by a horizontal density function running from 0.0 to 2.0 with a height of 
330) = 0.0625. Figures 11.1 and 11.2 provide plots of the density functions. 

It should be clear that the larger bandwidth provides more smoothing. 
In the limit, as the bandwidth approaches zero, the kernel density estimate 
matches the empirical estimate. Note that, if the bandwidth is too large, prob- 
ability will be assigned to negative values, which may be an undesirable result. 
Methods exist for dealing with that issue, but they will not be presented here. 
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Fig. 11.3. Triangular kernel density with bandwidth 0.1. 


roster etre tien ee ere ernst see sve a HC 


Fig. 11.6 Gamma kernel density with a = 50. 


Fig. 11.4 Triangular kernel density with bandwidth 1.0. 


and is graphed in Figures 11.5 and 11.6 for two a values. For this kernel, 
decreasing the value of a increases the amount of smoothing. Further discus- 
sion of the gamma kernel can be found in [23], where the author recommends 


a= Vii/ (a/ia —1)¥2, o 


For the triangular kernel, each point is replaced by a triangle. Pictures for 
the same two bandwidths used previously appear in Figures 11.3 and 11.4. 

Once again, the larger bandwidth provides more smoothing. The gamma 
kernel simply provides a mixture of gamma distributions where each data 
point provides the mean and the empirical probabilities provide the weights. 
The density function is 


11.3.1 Exercises 


11.22 Provide the formula for the Pareto kernel. 


gle Te/yi 


flas -oO JT) 


When computing values of the density function, overflow and underflow problems can be 
reduced by computing the logarithm of the elements of the ratio, that is, (a — 1)Ina — 
za/y; — aln(y;/a) —InIT(qa), and then exponentiating the result. 
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Table 11.6 Data for Exercise 11.24 


tj Sj Tj 
10 1 20 
34 1 19 
4T 1 18 
75 1 17 
156 1 16 
171 1 15 


11.23 Construct a kernel density estimate for the time to surrender for Data 
Set D2. Be aware of the fact that this is a mixed distribution (probability is 
continuous from 0 to 5 but is discrete at 5). 


11.24 (*) You are given the data in Table 11.6 on time to death. Using the 
uniform kernel with a bandwidth of 60, determine f (100). 


11.4 APPROXIMATIONS FOR LARGE DATA SETS 


11.4.1 Introduction 


When there are large amounts of data, constructing the Kaplan-Meier es- 
timate may require more sorting and counting than can be justified by the 
results. This is especially true if values of the distribution function are not 
needed at each value. For example, if the goal is to construct a mortality 
table, values are needed only at integral ages. The finer details of mortality 
table construction and alternative methods can be found in the texts by Bat- 
ten [11] and London [85]. While the context for the examples presented here 
will be the construction of mortality tables, the methods can apply anytime 
the data have been rounded. 

Suppose there are intervals given as Co < ĉı < ++- < cp. Let d; be the num- 
ber of observations that are left truncated at a value in the interval [c;, c;+1). 
In a mortality study, this would be a count of the lives that were first observed 
at an age in the given range. Similarly, let u; be the number of observations 
that are right censored at a value in the interval (c;,cj+1]. Note the difference 
in the endpoints of the two intervals. This is necessary because truncation is 
possible at the left end of the first interval but not at the right end of the last 
interval. The reverse is. true for censoring. Note that all observations are to 
contribute to some d;, but only observations that are actually censored can 
contribute to some uj. Let 2; be the number of uncensored observations in 


the interval (cj,cj4i]. Then n = ae dj = De (uj + zj) where n is the 
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sample size. The computational advantage is that one pass through the data 
set allows these values to be accumulated and from there only this reduced 
set of values needs to be processed. 


11.4.2 Kaplan-Meier type approximations 


In order to apply the Kaplan-Meier formula, assumptions must be made about 
the location of the values within each interval. The simplest is to assume that 
all of the uncensored observations in an interval occur at the same value within 
that interval, say c}, that all left truncated values are less than cj, and that 
all right censored allied are greater than or equal to Cj . The risk set at Cj 
is rj = 2a di — — (ui + 2;). Rather than place all “probably at the cj 
values (as is done by the Kaplan-Meier method), it is customary to evaluate 
the distribution function at the given endpoints and then smooth the function 
by interpolation (usually linear) between successive values. Then, 


F(co) = 0, 
j-1 hs 
F(c;) = 1-[[(1-#) 1512k (11.4) 
Let 
03 = ES EET _ Sc) ~ 56541) 
I 'j+1— Cj tC S(c;) « 
From (11.4), 
j-1 ( =) j ( =) 
IL, (1-2)-J[_ (1 
PORS i=l Ti i=0 Ti ae n Tj = Tj 
dj Te: E 1 ( z ae (11.5) 
i=0 Ti 


This is the traditional form of a life table estimator where the numerator has 
the number of observed deaths and the denominator is a measure of exposure 
(the number of lives available to die). For this formula, all who enter the 
study prior to or during the current interval are given a chance to die and 
all who have left prior to the interval are removed from consideration. If 
dollar amounts are being studied and the boundaries include all the possible 
values for deductibles and limits, this formula produces the exact product- 
limit estimate at the given values. For mortality studies, this is equivalent to 
having all lives enter on their birthdays (which may be true if insuring ages 
are used, see Exercise 11.26) and surrender on birthdays. 

Equation (11.5) can be generalized as follows. Let P; = Sead i— U; — Ti), 
the number of people under observation at age cj. Continne to assume that 
all the uncensored observations occur at a fixed point in the interval. Further 
assume that at this time 100a% of those who will enter (be counted in d;) have 


324 ESTIMATION FOR MODIFIED DATA 


done so and that 1008% of those who will be censored (be counted in uj) have 
done so. Then the risk set is r; = Pj + ad; — 8u;. Equation (11.5) is obtained 
by setting a = 1 and G@ = 0. An alternative is to set a = 8 = 0.5. This is 
equivalent to having the entrants and exits spread uniformly throughout the 
interval. The result is r; = Pj+0.5(dj—uj). Because P;.; = P;+dj—u; —23, 
this can be rewritten as 


Tj = 0.5(P; + Pj + z3); (11.6) 


a formula commonly used in constructing mortality tables. At times it may be 
necessary to make different assumptions for different intervals, as illustrated 
“in Example 11.20. 


11.4.3 Multiple-decrement tables 


The goal of all the estimation procedures in this text is to deduce the prob- 
ability distribution for the variable of interest in the absence of truncation 
and censoring. For loss data, that would be the probabilities if there were no 
deductible or limit. For lifetime data it would be the probabilities if we could 
follow people from birth to death. In the language of Actuarial Mathematics 
[16], these are called single-decrement rates and are denoted g;- It is often 
desired to create a table of multiple-decrement probabilities, denoted qj. A 
superscript identifies the decrement of interest. For example, suppose the 
decrements were death (d), withdrawal (w), and retirement (r). Then gT is 
the probability that a person age cj withdraws prior to age cj+1 in an environ- 
ment where death and retirement are not possible while g? is the probability 
that a person age c; retires prior to age Cj+1 in an environment where prior 
death or withdrawal eliminates the chance of retirement. Multiple-decrement 
tables are often constructed by obtaining the single-decrement rates from dif- 
ferent sources and then combining with a formula such as 


In(1 — i”) is 
(g) j (T) (T) I(g) 
g = mG > g =1-][a-q™), (11.7) 
J In(1 2 g J J at ‘Jj 


where g represents an arbitrary decrement and m is the total number of 
decrements. 


Example 11.20 Estimate both single- and multiple-decrement quantities us- 
ing Data Set D2 and the methods of this section. Make reasonable assump- 
tions. 


First consider the decrement death. In the notation of this section, the 
relevant quantities are in Table 11.7. For this setting, more than one assump- 
tion is needed. For dọ = 32 it is clear that the 30 values that are exactly 
zero should be treated as such (policies followed from the beginning) while 


nemesis REENEN AEEA 
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Table 11.7 Single-decrement calculations for Example 11.20 


j dj Uj Tj P; Tj go F(j) 
0 32 3 1 0 29.5 0.0339 0.0000 
1 2 2 0 28 28.0 0.0000 0.0339 
2 3 3 2 28 28.0 0.0714 0.0339 
3 3 3 3 26 26.0 0.1154 0.1029 
i 0 21 2 23 21.0 0.0952 0.2064 


Table 11.8 Multiple-decrement calculations for Example 11.20 


; 

j qs a g? gh gs”) 
ooo ŘS OUO 
0 0.0339 0.0984 0.1289 0.0322 0.0967 
1 0.0000 0.0690 0.0690 0.0000 0.0690 
2 0.0714 0.1053 0.1692 0.0676 0.1015 
3 0.1154 0.1154 0.2175 0.1087 0.1087 
4 0.0952 0.1818 0.2597 0.0864 0.1733 


the 2 policies that entered after issue require an assumption. It makes sense 
to assume a uniform spread. Then rp = 30+ 0.5(2) —0.5(3) = 29.5. The other 
r values are calculated using (11.6). Also note that the 17 policies that were 
still active after five years are all assumed to be censored at time 5, rather 
than be spread uniformly through the fifth year. 


For withdrawals the values of qi) are given in Table 11.8 along with the 


. calculated multiple decrement values using (11.7). o 


Example 11.21 Loss data for policies with deductibles of 0, 250, and 500 
and policy limits of 5,000, 7,500, and 10,000 were collected. The data are 
in Table 11.9. Use the methods of this section to estimate the distribution 
function for losses. - 


The calculations appear in Table 11.10. Because the deductibles and limits 
are at the endpoints of intervals, the only reasonable assumption is a = 1 and 


B=0. O 
11.4.4 Exercises 
11.25 Verify the calculations in Table 11.8. 


11.26 When life insurance policies are issued, it is customary to assign a 
whole-number age to the insured and then the premium associated with that 
age is charged. The interpretation of qz changes from “What is the probability 
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Table 11.9 Data for Example 11.21 


Deductible i 
Range 0 250 500 Total 
0-100 15 15 
100-250 16 16 
250-500 34 96 130 
500-1,000 73 175 251 499 
1,000-2,500 131 339 478 948 
2,500-5,000 83 213 311 607 
5,000-7,500 12 48 88 148 
7,500-10,000 1 4 11 16 
At 5,000 T 17 18 42 
At 7,500 5 10 15 30 
At 10,000 2 1 4 T 
Total 379 903 1,176 2,458 
Table 11.10 Calculations for Example 11.21 
Cj dj Uj fi Pj Tj hes F(c;) 
0 379 0 15 0 379 0.0396 0.0000 
100 0 0 16 364 364 0.0440 0.0396 
250 903 0 130 348 1,251 0.1039 0.0818 
500 1,176 0 499 1,121 2,297 0.2172 0.1772 
1,000 0 0 948 1,798 1,798 0.5273 0.3560 
2,500 0 42 607 850 850 0.7141 0.6955 
5,000 0 30 148 201 201 0.7363 0.9130 
7,500 0 7 16 23 23 0.6957 0.9770 
10,000 0.9930 


someone having their zth birthday dies in the next year?” to “What is the 
probability that a person who was assigned age x — k at issue k years ago 
dies in the next year?” Such age assignments are called “insuring ages” and 
effectively move the birthday to the policy issue date. As a result, insured 
lives tend to enter observation on their birthday (artificially assigned as the 
policy issue date). This makes the d value exact (with a = 1) rather than 
approximate. However, withdrawal can take place at any time, making § = 
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Table 11.11 Data for Exercise 11.26 


d u T d u £ 
45 46.0 45 45.8 

45 46.0 46 47.0 

45 45.3 46 47.0 

45 46.7 46 46.3 

45 45.4 46 46.2 
45 47.0 46 46.4 
45 45.4 46 46.9 


Table 11.12 Data for Exercise 11.27 


Deductible Payment Deductible Payment 
250 2,221 500 3,660 
250 2,500* 500 215 
250 207 500 1,302 
250 3,735 500 10,000* 
250 5,000* 1,000 1,643 
250 517 1,000 3,395 
250 5,743 1,000 3,981 
500 2,500* 1,000 3,836 
500 525 1,000 5,000* 
500 4,393 1,000 1,850 


500 5,000* 1,000 6,722 


*Amount paid was at the policy limit 


0.5 a reasonable assumption.® For the data in Table 11.11 estimate q45 and qas 
using both the exact Kaplan—Meier estimate and the method of this section. 


11.27 Twenty-two insurance payments are recorded in Table 11.12. Use the 
fewest reasonable number of intervals and the method of this section to esti- 
mate the probability that a policy with a deductible of 500 will have a payment 
in excess of 5,000. 


However, just as in Example 11.20 where different d values received different treatment, 
here there can be u values that deserve different treatment. Observation of an individual 
can end because the individual leaves or because the observation period ends. For studies 
of insured lives it is common for observation to end at a policy anniversary and thus at a 
whole-number insuring age. For them, 6 = 0 is an appropriate assumption. 
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Part IV 


Parametric statistical 
methods 


t 
i 
i 
3 
é 
£ 
i 
nmin 


| Parameter estimation 


If a phenomenon is to be modeled using a parametric model, it is necessary to 
assign values to the parameters. This could be done arbitrarily, but it would 
seem to be more reasonable to base the assignment on observations from that 
phenomenon. In particular, we will assume that n'independent observations 
have been collected. For some of the techniques it will be further assumed 
that all the observations are from the same random variable. For others, that 
restriction will be relaxed. l 

The methods introduced in Section 12.1 are relatively easy to implement 
but tend to give poor results. Section 12.2 covers maximum likelihood es- 
timation. This method is more difficult to use but has superior statistical 
properties and is considerably more flexible. 


12.1 METHOD OF MOMENTS AND PERCENTILE MATCHING 


For these methods we assume that all n observations are from the same para- 
metric distribution. In particular, let the distribution function be given by 


F(z|@), 07 = (61, 00,-.-,p) 


Loss Models: From Data to Decisions, Second Edition. 
By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 
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where 67 is the transpose of 0. That is, 8 is a column vector containing the 
p parameters to be estimated. Furthermore, let 4 (0) = E(X*|@) be the kth 
raw moment and let 7,(@) be the 100gth percentile of the random variable. 
That is, F'[7,(@)|6] = g. If the distribution function is continuous, there will 
be at least one solution to that equation. 

For a remple: of n independent observations from this random variable, let 
iy, = mo. jn Tj k be the empirical estimate of the kth moment and let 7 fig be 
the empirical estimate of the 100gth percentile 


Definition 12.1 A method-of-moments estimate of 0 is any solution of 
the p equations 


py (0) = àk, k= 1,2,...,p. 


The motivation for this estimator is that it produces a model that has the 
same first p raw moments as the data (as represented by the empirical dis- 
tribution). The traditional definition of the method of moments uses positive 
integers for the moments. Arbitrary negative or fractional moments could also 
be used. In particular, when estimating parameters for inverse distributions, 
matching negative moments may be a superior approach.! 


Example 12.2 Use the method of moments to estimate parameters for the 
. exponential, gamma, and Pareto distributions for Data Set B from Chapter 
10. 
The first two sample moments are 
al 
I = 
jh = 


Me Be 


(27 + +++ + 15,743) = 1,424.4, 
5 (277 +--+ + 15,743?) = 13,238,441.9. 


For the exponential distribution the equation is 
0 = 1,424.4 


with the obvious solution, Ô = 1,424.4. 
For the gamma distribution, the two equations are 


E(X) 
E(X?) 


ad = 1,424.4, 
a(a + 1)6? = 13,238,441.9. 


Dividing the second equation by the square of the first equation yields 


a+] 


= 6.52489, 1 = 5.52489a 


1One advantage is that with appropriate moments selected the equations must have a 
solution within the range of allowable parameter values. 
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and so & = 1/5.52489 = 0.18100 and 6 = 1,424.4/0.18100 = 7,869.61. 
For the Pareto distribution, the two equations are 


0 


267 


(a: — 1)(a ~ 2) 


Dividing the second equation by the square of the first equation yields 


E(X?) = = 13,238,441.9. 


2(a — 1) 
(ea) = 052489 


with a solution of & = 2.442 and then Ê = 1,424.4(1.442) = 2,053.985. o 


There is no guarantee that the equations will have a solution or, if there is 
a solution, that it will be unique. 


Definition 12.3 A percentile matching estimate of 0 is any solution of 
the p equations 
Tgp (0) = fg,, k =1,2,...,P, 


where 91,92,---;9p are p arbitrarily chosen percentiles. From the definition 
of percentile, the equations can also be written 


F (g, l0) = 9k; k=1,2,...,p. 


The motivation for this estimator is that it produces a model with p per- 
centiles that match the data (as represented by the empirical distribution). 
As with the method of moments, there is no guarantee that the equations will 
have a solution or, if there is a solution, that it will be unique. One problem 
with this definition is that percentiles for discrete random variables (such as 
the empirical distribution) are not always well defined. For example, Data 
Set B has 20 observations. Any number between 384 and 457 has 10 observa- 
tions below and 10 above and so could serve as the median. The convention 
is to use the midpoint. However, for other percentiles, there is no “official” 
interpolation scheme.” The following definition will be used here. 


Definition 12.4 The smoothed empirical estimate of a percentile is found 
by 


Tg 


j 


(1 — h)g) + hzi), where 
|(n+1)g| and h = (n+ 1)g — j. 


Hyndman and Fan [65] present nine different methods. They recommend a slight modifi- 
cation of the one presented here using j = |g(n + 5) + 4] and h = g(n+ ł) + 3 — j. 
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Here |-| indicates the greatest integer function and xq) < Z2) S -+> S Ln) 
are the order statistics from the sample. 


Unless there are two or more data points with the same value, no two 
percentiles will have the same value. One feature of this definition is that i, 
cannot be obtained for g < 1/(n+1) or g > n/(n+1). This seems reasonable as 
we should not expect to be able to infer the value of large or small percentiles 
from small samples. We will use the smoothed version whenever an empirical 
percentile estimate is called for. 


Example 12.5 Use percentile matching to estimate parameters for the expo- 
nential and Pareto distributions for Data Set B. 


For the exponential distribution, select the 50th percentile. The empirical 
estimate is the traditional median of o.s = (384 + 457)/2 = 420.5 and the 
equation to solve is 


0.5 = F(420.5|6) = 1 — e7*0-5/0, 
—420. 
In 0.5 a z n 


4 —420.5 
= 05 7 606.65. 


il 


For the Pareto distribution, select the 30th and 80th percentiles. The 
smoothed empirical estimates are found as follows: 


30th: j = |21(0.3)| = [6.3] = 6, h = 6.3 — 6 = 0.3, 
os = 0.7(161) + 0.3(243) = 185.6, 
80th: |21(0.8)| = [16.8] = 16, h = 16.8 — 16 = 0.8, 


aps. 
lI 


ii 0.2(1,193) + 0.8(1,340) = 1,310.6. 


The equations to solve are 


8 Q 
0.3 = F(185.6)=1— {———— 
( ) (Garr) : 
6 a 
0.8 = F(1,310.6) =1-—{—————-} ,, 
( ) (aas) i 
In0.7 = —0.356675 =aln het , 
185.6 + 0 
6 
In0.2 = —1.609438 = aln | ——_—_——. 
R 4 (ra). 
In ( 2 
—1.609438 — 4512338 = 1,310.6+8 
—0.356675 on ee 


in (ress) 


Sele ere 
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Any of the methods from Appendix F can be used to solve this equation for 
ĝ = 715.03. Then, from the first equation, 


715.03 ig 
Teo (a + mm) ; 


which yields â = 1.54559. o 


The estimates are much different from those obtained in Example 12.2. 
This is one indication that these methods may not be particularly reliable. 


12.1.1 Exercises 


12.1 Determine the method-of-moments estimate for an exponential model 
for Data Set B with observations censored at 250. 


12.2 Determine the method-of-moments estimate for a lognormal model for 
Data Set B. 


12.3 (*) The 20th and 80th percentiles from a sample are 5 and 12 respec- 
tively. Using the percentile matching method, estimate 5(8) assuming the 
population has a Weibull distribution. 


12.4 (*) From a sample you are given that the mean is 35,000, the standard 
deviation is 75,000, the median is 10,000, and the 90th percentile is 100,000. 
Using the percentile matching method, estimate the parameters of a Weibull 
distribution. 


12.5 (*) A sample of size 5 produced the values 4, 5, 21, 99, and 421. You 
fit a Pareto distribution using the method of moments. Determine the 95th 
percentile of the fitted distribution. 


12.6 (*) In year 1 there are 100 claims with an average size of 10,000 and in 
year 2 there are 200 claims with an average size of 12,500. Inflation increases 
the size of all claims by 10% per year. A Pareto distribution with a = 3 and 
0 unknown is used to model the claim size distribution. Estimate 0 for year 
3 using the method of moments. 


12.7 (*) From a random sample the 20th percentile is 18.25 and the 80th 
percentile is 35.8. Estimate the parameters of a lognormal distribution using 
percentile matching and then use these estimates to estimate the probability 
of observing a value in excess of 30. 


12.8 (*) A claim process is a mixture of two random variables A and B, where 
A has an exponential distribution with a mean of 1 and B has an exponential 
distribution with a mean of 10. A weight of p is assigned to distribution A and 
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1 —p to distribution B. The standard deviation of the mixture is 2. Estimate 
p by the method of moments. 


12.9 (*) A random sample of 20 observations has been ordered as follows: 


12 16 20 23 26 28 30 32 33 35 
36 38 39 40 41 43 45 47 50 57 


Determine the 60th sample percentile using the smoothed empirical estimate. 


12.10 (*) The following 20 wind losses (in millions of dollars) were recorded 
in one year: i 

1 1 1 1 1 2 2 3 3 4 

6 6 8 10 13 14 15 18 22 25 


Determine the sample 75th percentile using the smoothed empirical esti- 
mate. 


12.11 (*) The observations 1,000, 850, 750, 1,100, 1,250, and 900 were ob- 
tained as a random sample from a gamma distribution with unknown para- 
meters a and 0. Estimate these parameters by the method-ofmoments. 


12.12 (*) A random sample of claims has been drawn from a loglogistic dis- 
tribution. In the sample, 80% of the claims exceed 100 and 20% exceed 400. 
Estimate the loglogistic parameters by percentile matching. 


12.13 (*) Let 2,...,¢, be a random sample from a population with cdf 
F(x) =2?, 0< «<1. Determine the method of moments estimate of p. 


12.14 (*) A random sample of 10 claims obtained from a gamma distribution 
is given below: 


1,500 6,000 3,500 3,800 1,800 5,500 4,800 4,200 3,900 3,000 
Estimate a and 0 by the method of moments. 


12.15 (*) A random sample of five claims from a lognormal distribution is 
given below: 


500 1,000 1,500 2,500 4,500. 


Estimate u and ø by the method of moments. Estimate the probability 
that a loss will exceed 4,500. 


12.16 (*) The random variable X has pdf f(z) = 6~°z exp(—0.5z?/6”), £, 
@ > 0. For this random variable, E(X) = (8/2)V27 and Var(X) = 26? — 


ap? /2. You are given the following five observations: 


49 18 34 69 4.0 
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Table 12.1 Data for Exercise 12.18 


No. of claims No. of policies 


0 9,048 
1 905 
2 45 
3 2 


4+ 0 


Table 12.2 Data for Exercise 12.19 


No. of claims No. of policies 


0 861 
1 121 
2 13 
3 3 
4 1 
5 0 
6 1 
T+ 0 


Determine the method-of-moments estimate of 8. ° 


12.17 The random variable X has pdf f(x) = aA*(A+2)~%"1, z,a,A > 0. 
It is known that À = 1,000. You are given the following five observations: 


43 145 233 396 775 
Determine the method of moments estimate of a. 


12.18 Use the data in Table 12.1 to determine the method-of-moments esti- 
mate of the parameters of the negative binomial model. 


12.19 Use the data in Table 12.2 to determine the method-of-moments esti- 
mate of the parameters of the negative binomial model. 


12.2 MAXIMUM LIKELIHOOD ESTIMATION 


12.2.1 Introduction 


Estimation by the method of moments and percentile matching is often easy 
to do, but these estimators tend to perform poorly. The main reason for this 
is that they use a few features of the data, rather than the entire set of obser- 
vations. It is particularly important to use as much information as possible 
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when the population has a heavy right tail. For example, when estimating 
parameters for the normal distribution, the sample mean and variance are 
sufficient.’ However, when estimating parameters for a Pareto distribution, 
it is important to know all the extreme observations in order to successfully 
estimate a. Another drawback of these methods is that they require that all 
the observations are from the same random variable. Otherwise, it is not clear 
what to use for the population moments or percentiles. For example, if half 
the observations have a deductible of 50 and half have a deductible of 100, 
it is not clear to what the sample mean should be equated.* Finally, these 
methods allow the analyst to make arbitrary decisions regarding the moments 
or percentiles to use. i 
There are a variety of estimators that use the individual data points. All 
of them are implemented by setting an objective function and then determin- 
ing the parameter values that optimize that function. For example, we could 
estimate parameters by minimizing the maximum difference between the dis- 
tribution function for the parametric model and the distribution function for 
the Nelson—Aalen estimate. Of the many possibilities, the only one used here 
is the maximum likelihood estimator. The general form of this estimator is 
presented in this introduction. This is followed with useful special cases. 
To define the maximum likelihood estimator, let the data set consist of n 
events Aj,...,An, where A; is whatever was observed for the jth observation. 
` For example, A; may consist of a single point or an interval. The latter arises 
with grouped data or when there is censoring. For example, when there is 
censoring at u and a censored observation is observed, the observed event 
is the interval from u to infinity. Further assume that the event A; results 
from observing the random variable X j- The random variables X4,..., Xn 
need not have the same probability distribution, but their distributions must 
depend on the same parameter vector, 0. In addition, the random variables 
are assumed to be independent. 


Definition 12.6 The likelihood function is 


L(@) = Tle, E A;lļ0) 


jal 


and the maximum likelihood estimate of @ is the vector that maximizes 
the likelihood function.® 


3This applies both in the formal statistical definition of sufficiency (not covered here) and 
in the conventional sense. If the population has a normal distribution, the sample mean 
and variance convey as much information as the original observations. 

4One way to rectify that drawback is to first determine a data-dependent model such as 
the Kaplan-Meier estimate. Then use percentiles or moments from that model. 

5 Some authors write the likelihood function as L(6|x), where the vector x represents the ob- 
served data. Because observed data can take many forms, the dependence of the likelihood 
function on the data is suppressed in the notation. 
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There is no guarantee that the function has a maximum at eligible para- 
meter values. It is possible that as various parameters become zero or infinite 
the likelihood function will continue to increase. Care must be taken when 
maximizing this function because there may be local maxima in addition to 
the global maximum. Often, it is not possible to analytically maximize the 
likelihood function (by setting partial derivatives equal to zero). Numerical 
approaches, such as those outlined in Appendix F, will usually be needed. 

Because the observations are assumed to be independent, the product in 
the definition represents the joint probability Pr(X, € Ai,...,Xn € Anl), 
that is, the likelihood function is the probability of obtaining the sample 
results that were obtained, given a particular parameter value. The estimate 
is then the parameter value that produces the model under which the actual 
observations are most likely to be observed. One of the major attractions of 
this estimator is that it is almost always available. That is, if you can write an 
expression for the desired probabilities, you can execute this method. If you 
cannot write and evaluate an expression for probabilities using your model, 
there is no point in postulating that model in the first place because you will 
not be able to use it to solve your problem. 


Example 12.7 Suppose the data in Data Set B were censored at 250. De- 
termine the maximum likelihood estimate of 0 for an exponential distribution. 


The first seven data points are uncensored. For them, the set A; contains 
the single point equal to the observation Tj. When calculating the likelihood 
function for a single point for a continuous model, it is necessary to interpret 
Pr(X; = x;) = f(z;). That is, the density function should be used. Thus the 
first seven terms of the product are 


f(27)f(82) Tak f (243) = 971 ¢727/8 9-1 782/8 tos @~1¢—243/8 = Q77 7909/8. 


For the final 13 terms, the set A; is the interval from 250 to infinity and 
therefore Pr( X; € A;) = Pr(Xj > 250) = e~ 50/8 There are 13 such factors 
making the likelihood function 


L(0) = 9—7 e7909/ ( e—250/0)13 E 977 e74159/0, 
It is easier to maximize the logarithm of the likelihood function. Because it 


occurs so often, we denote the loglikelihood function as 1(@) = In L(6). 
Then 


(0) = —7n@—4,15907*, 
(6) = —707} + 4,15907? =0, 
§ = a = 594.14. 


In this case, the calculus technique of setting the first derivative equal to zero 
is easy to do. Also, evaluating the second derivative at this solution produces 
a negative number, verifying that this solution is a maximum. O 
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12.2.2 Complete, individual data 


When there is no truncation, and no censoring and the value of each observa- 
tion is recorded, it is easy to write the loglikelihood function. 


£(9) = TI fx; (2310), 10) = Sm fx, (056). 
j=l j=l 


The notation indicates that it is not necessary for each observation to come 
from the same distribution. 


Example 12.8 Using Data Set B determine the mazimum. likelihood esti- 
mates for an exponential distribution, for a gamma distribution where a is 


known to equal 2, and for a gamma distribution where both parameters are 
unknown. 


For the exponential distribution, the general solution is 


n 


(6) = $` (-né—2;671) = —ning — n307, 
j=1 
(0) = —=n0™ + n30? =0, 
nb- = n, 
ô = z. 


For Data Set B, = Z = 1,424.4. The value of the loglikelihood function is 

—165.23. For this situation the method-of-moments and maximum likelihood 
estimates are identical. 

For the gamma distribution with a = 2, 

g?-le-2/8 

z|) = ——— 


ln f(z) = Inz—2ind—2z67!, 


pe —2 _—r/8 
= z9? eT? 


(0) = Ina; —2nIn6 — nzo, 


j=1 
"(0) = —2n0™} + n0? =0, 
6 = 42. 


For Data Set B, ĝ = 1,424.4/2 = 712.2 and the value of the loglikelihood func- 
tion is —-179.98. Again, this estimate is the same as the method of moments 
estimate. 


For the gamma distribution with unknown parameters the equation is not 
as simple. 


a-1,—2x/6 
f(zla, 9) Tae 


ln f(z|a,0) = (a—1)Inz— x67 InT(a) -—alné. 
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The partial derivative with respect to a requires the derivative of the gamma 
function. The resulting equation cannot be solved analytically. Using numer- 
ical methods, the estimates are @ = 0.55616 and @ = 2,561.1 and the value 
of the loglikelihood function is —162.29. These do not match the method-of- 
moments estimates. O 


12.2.3 Complete, grouped data 


When data are complete and grouped, the observations may be summarized 
as follows. Begin with a set of numbers co < c1 < --- < cy, where co is the 
smallest possible observation (often zero) and cx is the largest possible obser- 
vation (often infinity). From the sample, let n; be the number of observations 
in the interval (cj—1,c;]. For such data, the likelihood function is 


k 
L(@) = [lf (c;|@) — F (c; 10] 
and its logarithm is 
k 
1(8) = $ n Inf (e;l0) — F(c;-110)]. 


Example 12.9 From Data Set C, determine the mazimum likelihood esti- 
mate for an exponential distribution. 


The loglikelihood function is 


(8) = 99In[F(7,500) — F(0)] + 42 In[F (17,500) — F(7,500)] + --- 
+3 In[1 — F(300,000)] 
= 99In(1 — 7 7500/9) 4. 49 In(e~ 7500/8 _ —-17,500/8) . 
+3 In e7~300,000/8 | 


A numerical routine is needed to produce ĝ = 29,721 and the value of the 
loglikelihood function is —406.03. o 


12.2.4 Truncated or censored data 


When data are censored, there is no additional complication. As noted in 
Example 12.7, right censoring simply creates an interval running from the 
censoring point to infinity. In that example, data below the censoring point 
were individual data, and so the likelihood function contains both density and 
distribution function terms. 

Truncated data present more of a challenge. There are two ways to pro- 
ceed. One is to shift the data by subtracting the truncation point from each 


342 PARAMETER ESTIMATION 


observation. The other is to accept the fact that there is no information 
about values below the truncation point but then attempt to fit a model for 
the original population. 


Example 12.10 Assume the values in Data Set B had been truncated from 
below at 200. Using both methods, estimate the value of a for a Pareto dis- 
tribution with @ = 800 known. Then use the model to estimate the cost per 
payment with deductibles of 0, 200, and 400. 


Using the shifting approach, the values become 43, 94, 140, 184, 257, 480, 
655, 677, 774, 993, 1,140, 1,684, 2,358, and 15,543. The likelihood function is 


(800°) 
he) = eee L (800 +r)? 
14 
a). = $ [ina + aœln800— (a +1) n(x; + 800)] 
j=l 
= Mina + 93.5846a — 103.969(a + 1) 
= 14Ina— 103.969 — 10.384a, 
l(a) = 1a! — 10.384, 
Ae, E 


10.384 


Because the data have been shifted, it is not possible to estimate the cost with 
no deductible. With a deductible of 200, the expected cost is the expected 
value of the estimated Pareto distribution, 800/0.3482 = 2,298. Raising the 
deductible to 400 is equivalent to imposing a deductible of 200 on the modeled 
distribution. From Theorem 5.13, the expected cost per payment is 


800 s00  \ 03482 
E(X) — E(X A200) _ 0.3482 \ 200 + 800 _ 1000 _, a 
1 — F(200) 7 ( 800 yo ~ 0.3482 °° S 


200 + 800 


For the unshifted approach we need to ask the key question required when 
constructing the likelihood function. That is, what is the probability of ob- 
serving each value knowing that values under 200 are omitted from the data 
set? This becomes a conditional probability and therefore the likelihood func- 


| 
| 
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tion is (where the z; values are now the original values) 
14 ; a 
_ fja) a(800%) ( 800 ) | 
pa = I 1 — F(200|a) l (800 + z;)e+! 800 + 200 
14 


B Tl goose a(1,000%) 
a on +2; San 


14 


a) = 14Ina+ 14aln1,000 — (œ + 1) $` In(800 + z;), 
j=1 
= 14lna+96.709a — (a + 1)105.810, 
l(a) = 14a7!—9.101, 
& = 1.5383. 


This model is for losses with no deductible, and therefore the expected cost 
without a deductible is 800/0.5383 = 1,486. Imposing deductibles of 200 and 
400 produces the following results: 


E(X)—E(X A200) 1,000 


= = 1,858, 
1 — F(200) 0.5383 
E(X) -E(X A400) _ 1200 _ 999, 
1— F(400) 0.5383 o 


It should now be clear that the contribution to the likelihood function can 
be written for most any observation. The following two steps summarize the 
process: 


1. For the numerator, use f(x) if the exact value, x, of the observation is 
known. If it is only known that the observation is between y and z, use 


F(z) = Ey). 


2. For the denominator, let d be the truncation point (use zero if there is 
no truncation). The denominator is then 1 — F(d). 


Example 12.11 Determine Pareto and gamma models for the time to death 
for Data Set D2. 


Table 12.3 shows how the likelihood function is constructed for these values. 
For deaths, the time is known and so the exact value of x is available. For 
surrenders or those reaching time 5, the observation is censored and therefore 
death is known to be some time in the interval from the surrender time, y, to 
infinity. In the table, z = oo is not noted because all interval observations end 
at infinity. The likelihood function must be maximized numerically. For the 
Pareto distribution there is no solution. The likelihood function keeps getting 
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Table 12.3 Likelihood function for Example 12.11 


1 y=01 0 1-F(0.1) 

2 y=05 0 1-—F(0.5) 

3 y=08 0 1-—F(0.8) 

4 z=08 0 f(0.8) 

5 y=18 0 1-F(1.8) 
6 y=18 0 1-F(1.8) 
7 y=21 0 1-F(21) T 
8 y=25 0 1-—F(2.5) T. 
9 y=28 0 1-—F(2.8) 
10 r=2.9 0 f(2.9) 36 y=5.0 2.9 FOR 
11 t=29 0 f(2.9) 37 y=48 29 Pep 
12 y=3.9 0 1-—F(3.9) | 38 t=40 3.2 A 
13 «=40 0 f(40) 39 y=50 34 6D 
14 y=40 0 1-F(4.0)| 40 y=5.0 3.9 ee 
"15 y=41 0 1-—F(41) 


larger as a and 0 get larger. For the gamma distribution the maximum is at 
â = 2.617 and @ = 3.311. o 


Discrete data present no additional problems. 


Example 12.12 For Data Set A, assume that the seven drivers with five 
or more accidents all had exactly five accidents. Determine the maximum 
likelihood estimate for a Poisson distribution and for a binomial distribution 
with m = 8. 


In general, for a discrete distribution with complete data, the likelihood 
function is 


L(9) = | [ (2,1), 
j=l 


where gj is one of the observed values, p(x;|@) is the probability of observing 
Tj, and nz is the number of times x was observed in the sample. For the 


6 For a Pareto distribution, the limit as the parameters a and 8 become infinite with the ratio 
being held constant is an exponential distribution. Thus, for this example, the exponential 
distribution is a better model (as measured by the likelihood function) than any Pareto 
model. 


| 
i 
| 
| 
| 
I 
| 
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Poisson distribution 


O eA" ae estes et 
LA) = IL ( zl ) =LI (l)r= ? 
= oo 
WA) = So (-ngd+ ang Ind — ne nz!) =—n\+nElnd— 5) ng lnz!, 
i z=0 x=0 
FO) = —n+— =0, 
À 
A = &. 
For the binomial distribution 
m on B Nz m mrs gtr (1 — q) 
pr Z({ —g)™-2 ped - : 
n = Heee) = ete a 
Ka) = > [nelnm! + ang ing+(m—2)nzIn(1 —9)] 
z=0 
m 
— X [ns nz! + ns n(m — 2)I], 
xz=0 
m sad, — 
; in, (mM—Z)ng _ nī mn-nĒ _ 4 
l(a) = 3 ; = a 
Tī 
q = ay 


For this problem, Z = [81,714(0) + 11,306(1) + 1,618(2) + 250(3) + 40(4) + 
7(5)|/94,935 = 0.16313. Therefore, for the Poisson distribution À = 0.16313, 
and for the binomial distribution ĝ = 0.16313/8 = 0.02039. O 


In Exercise 12.25 you are asked to estimate the Poisson parameter when 
the actual values for those with five or more accidents are not known. 


12.2.5 Exercises 


12.20 Repeat Example 12.8 using the inverse exponential, inverse gamma 
with a = 2, and inverse gamma distributions. Compare your estimates with 
the method-of-moments estimates. 


12.21 From Data Set C, determine the maximum likelihood estimates for 
gamma, inverse exponential, and inverse gamma distributions. 


12.22 Determine maximum likelihood estimates for Data Set B using the 
inverse exponential, gamma, and inverse gamma distributions. Assume the 


346 PARAMETER ESTIMATION 


Table 12.4 Data for Exercise 12.27 


Age last observed Cause 
ae ee 
1.7 Death 
1.5 Censoring 
2.6 Censoring 
3.3 Death 
3.5 Censoring 


—neereencninnennnnneaeiitttinnnnnnntnn een 


data have been censored at 250 and then compare your answers to those 
obtained in Example 12.8 and Exercise 12.20. 


12.23 Repeat Example 12.10 using a Pareto distribution with both parame- 
ters unknown. 


12.24 Repeat Example 12.11, this time finding the distribution of the time 
to surrender. 


12.25 Repeat Example 12.12, but this time assume that the actual values for 
the seven drivers who have five or more accidents are unknown. Note that 
this is a case of censoring. 


12.26 (*) Lives are observed in order to estimate q35. Ten lives are first 
observed at age 35.4 and 6 die prior to age 36 while the other 4 survive to age 
36. An additional 20 lives are first observed at age 35 and 8 die prior to age 
36 with the other 12 surviving to age 36. Determine the maximum likelihood 
estimate of g35 given that the time to death from age 35 has density function 
f(t) =w, 0<t <1, with f(t) unspecified for t > 1. 


12.27 (*) The model has hazard rate function h(t) = à, 0 < t< 2, and 
h(t) = A2, t > 2. Five items are observed from age zero, with the results in 
Table 12.4. Determine the maximum likelihood estimates of Ay and às. 


12.28 (*) Your goal is to estimate qs. The time to death for a person age 
x has a constant density function. In a mortality study, 10 lives were first 
observed at age x. Of them, 1 died and 1 was removed from observation alive 
at age z+ 0.5. Determine the maximum likelihood estimate of qz- 


12.29 (*) Ten lives are subject to the survival function 


4\ 1/2 
s= (1-5) ,0<t<k, 


where t is time since birth. There are 10 lives observed from birth. At time 10, 
2 of the lives die and the other 8 are withdrawn from observation. Determine 
the maximum likelihood estimate of k. 
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12.30 (*) Five hundred losses are observed. Five of the losses are 1,100, 
3,200, 3,300, 3,500, and 3,900. All that is known about the other 495 losses 
is that they exceed 4,000. Determine the maximum likelihood estimate of the 
mean of an exponential model. 


12.31 (*) One hundred people are observed at age 35. Of them, 15 leave 
the study at age 35.6, 10 die sometime between ages 35 and 35.6, and 3 die 
sometime after age 35.6 but before age 36. The remaining 72 people survive 
to age 36. Determine the product-limit estimate of q35 and the maximum 
likelihood estimate of q35. For the latter, assume the time to death is uniform 
between ages 35 and 36. 


12.32 (*) The survival function is S(t) = 1 — t/w, 0 < t < w. Five claims 
were studied in order to estimate the distribution of the time from reporting 
to settlement. After five years, four of the claims were settled, the times 
being 1, 3, 4, and 4. Actuary X then estimates w using maximum likelihood. 
Actuary Y prefers to wait until all claims are settled. The fifth claim is settled 
after six years, at which time actuary Y estimates w by maximum likelihood. 
Determine the two estimates. 


12.33 (*) Four automobile engines were first observed when they were three 
years old. They were then observed for r additional years. By that time three 
of the engines had failed, with the failure ages being 4, 5, and 7. The fourth 
engine was still working at age 3-+r. The survival. function has the uniform 
distribution on the interval 0 to w. The maximum likelihood estimate of w is 
13.67. Determine r. 


12.34 (*) Ten claims were observed. The values of seven of them (in thou- 
sands) were 3, 7, 8, 12, 12, 13, and 14. The remaining three claims were all 
censored at 15. The proposed model has a hazard rate function given by 


à, O<t<5, 
h(t) = d2; 5 < t< 10, 
às, t210. 


Determine the maximum likelihood estimates of the three parameters. 


12.35 (*) You are given the five observations 521, 658, 702, 819, and 1,217. 
Your model is the single-parameter Pareto distribution with distribution func- 
tion 


F(z)=1-— (=) , x > 500, a> 0. 
Determine the maximum likelihood estimate of a. 


12.36 (*) You have observed the following five claim severities: 11.0, 15.2, 
18.0, 21.0, and 25.8. Determine the maximum likelihood estimate of yz for the 
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folowing model: 


f(z) = 


1 
— >0. 
Zep |- zeA al, T, 4 
12.37 (*) A random sample of size 5 is taken from a Weibull distribution 
with + = 2. Two of the sample observations are known to exceed 50 and the 
three remaining observations are 20, 30, and 45. Determine the maximum 
likelihood estimate of 8. 


12.38 (*) Phil and Sylvia are competitors in the light bulb business. Sylvia 
advertises that her light bulbs burn twice as long as Phil’s. You were able 
to test 20 of Phil’s bulbs and 10 of Sylvia’s. You assumed that both of their 
bulbs have an exponential distribution with time measured in hours. You have 
separately estimated the parameters as 6p = 1,000 and 65 = 1,500 for Phil 
and Sylvia respectively, using maximum likelihood. Using all 30 observations, 
determine ĝ* , the maximum likelihood estimate of 9p restricted by Sylvia’s 
claim that ae = 26p. 


12.39 (*) A sample of 100 losses revealed that 62 were below 1,000 and 
38 were above 1,000. An exponential distribution with mean @ is considered. 
Using only the given information, determine the maximum likelihood estimate 
of 6. Now suppose you are also given that the 62 losses that were below 1,000 
totalled 28,140 while the total for the 38 above 1,000 remains unknown. Using 
this additional information, determine the maximum likelihood estimate of 0. 


12.40 (*) The following values were calculated from a random sample of 10 
losses: 


i2, 27? = 0.00033674, 752, 2; * = 0.023999, 
yi 27 = 0.34445, Dje 295 = 488.97 


Dji z7 = 31,939, Soyo 2? = 211,498,983. 


3 


Losses come from a Weibull distribution with 7 = 0.5 [so F(z) = 1—e7(7/ Oral 
Determine the maximum likelihood estimate of 0. 


12.41 (*) For claims reported in 1997, the number settled in 1997 (year 0) 
was unknown, the number settled in 1998 (year 1) was 3, and the number 
settled in 1999 (year 2) was 1. The number settled after 1999 is unknown. 
For claims reported in 1998 there were 5 settled in year 0, 2 settled in year 1, 
and the number settled after year 1 is unknown. For claims reported in 1999 
there were 4 settled in year 0 and the number settled after year 0 is unknown. 
Let N be the year in which a randomly selected claim is settled and assume 
that it has probability function Pr(N = n) = pn = (1 — p)p”, n = 0,1,2,.... 
Determine the maximum likelihood estimate of p. 
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12.42 (*) A sample of n independent observations 71,...,¢%, came from a 
distribution with a pdf of f(x) = 20rexp(—6x”), x > 0. Determine the 
maximum likelihood estimator (mle) of 6. 


12.43 (*) Let 2,...,2, be a random sample from a population with cdf 
F(z) =2z?, 0<a< 1. Determine the mle of p. 


12.44 A random sample of 10 claims obtained from a gamma distribution is 
given below: 


1,500 6,000 3,500 3,800 1,800 5,500 4,800 4,200 3,900 3,000 


(a) (*) Suppose it is known that a = 12. Determine the maximum 
likelihood estimate of 6. 


(b) Determine the maximum likelihood estimates of a and 9. 


12.45 A random sample of five claims from a lognormal distribution is given 
below: 


500 1,000 1,500 2,500 4,500 


Estimate u and o by maximum likelihood. Estimate the probability that 
a loss will exceed 4,500. 


12.46 (*) Let 2,...,2, be a random sample from a random variable with 
pdf f(z) = 6~'e~*/®, s > 0. Determine the maximum likelihood estimator 
of 8. 


12.47 (*) The random variable X has pdf f(x) = 67x exp(—0.52?/?), z 
8 > 0. For this random variable, E(X) = (@/2)/2m and Var(X) = 26? — 
np? /2. You are given the following five observations: 

49 18 3.4 69 4.0 


Determine the maximum likelihood estimate of 8. 


12.48 (*) Let 2,...,2, be a random sample from a random variable with 
cdf F(z) = 1 — 17%, x > 1, a > 0. Determine the maximum likelihood 
estimator of a. 


12.49 (*) The random variable X has pdf f(z) = aA*(A+2)~*"1, z,a,A > 
0. It is known that A = 1,000. You are given the following five observations: 


43 145 233 396 775 


Determine the maximum likelihood estimate of a. 
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Table 12.5 Data for Exercise 12.51 
No. of observations 


No. of observations 


0-25 5 350-500 

25-50 37 500-750 

50-75 28 750-1000 12 

75-100 31 1,000-1,500 3 

100-125 23 1,500-2,500 5 

125-150 9 2,500-5,000 5 

150-200 22 5,000—10,000 3 
10,000—25,000 3 
25,000- 2 


12.50 The following 20 observations were collected. It is desired to estimate 
Pr(X > 200). When a parametric model is called for, use the single-parameter 
Pareto distribution for which F(x) = 1 — (100/z)*, x > 100, a > 0. 


132 149 476 147 135 110 176 107 147 165 
135 117 110 111 226 108 102 108 227 102 


(a) Determine the empirical estimate of Pr(X > 200). 


(b) Determine the method-of-moments estimate of the single-parameter 
Pareto parameter œ and use it to estimate Pr(X > 200). 


(c) Determine the maximum likelihood estimate of the single-parameter 
Pareto parameter a and use it to estimate Pr(X > 200). 


12.51 The data in Table 12.5 presents the results of a sample of 250 losses. 
Consider the inverse exponential distribution with cdf F(x) = e~8/*, x > 
0, 6 >0. Determine the maximum likelihood estimate of 8. 


12.52 Consider the inverse Gaussian distribution with density given by 


1/2 a 
fete) = (32) on |= (4) | z>0. 


(a) Show that 
Seas cee eee 
PEE Tj Tj T T oa 
ņj=1 
where F = (1/n) D j= 25. 


(b) For a sample (2,--- , 
timates of u and @ are 


Tn), show that the maximum likelihood es- 


=i 
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and 


12.53 Suppose that X1,...,X, are independent and normally distributed 
with mean E(X;) = u and Var(X;) = (@m;)~!, where mj > 0 is a known 
constant. Prove that the maximum likelihood estimates of and @ are 


jp=X 


and 


ĝ=n Eres — X)? 


j=1 


where X = (1/m) Xj- mj Xj and m = J71 mj. 


12.3 VARIANCE AND INTERVAL ESTIMATION 


In general, it is not easy to determine the variance of complicated estima- 
tors such as the maximum likelihood estimator. However, it is possible to 
approximate the variance. The key is a theorem that can be found in most 
mathematical statistics books. The particular version stated here and its 
multi-parameter generalization is taken from [112] and stated without proof. 
Recall that L(@) is the likelihood function and [() its logarithm. All of the 
results assume that the population has a distribution that is a member of the 
chosen parametric family. 


Theorem 12.13 Assume that the pdf (pf in the discrete case) f(x;0) sat- 


isfies the following for 0 in an interval containing the true value (replace 
integrals by sums for discrete variables): 


(i) In f(a; 0) is three times differentiable with respect to 0. 


(ii) f 2 f(z;0)dx = 0. This implies that the derivative may be taken out- 


side the integral and so we are just differentiating the constant 1.7 


(itt) IŽ rE i (x;0)dr =0. This is the same concept for the second derivative. 


TThe integrals in (ii) and (iii) are to be evaluated over the range of x values for which 


F(z; 0) > 0. 
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2 
(iv) =œ < f f(z; De In f(x; 0) dx < 0. This establishes that the indicated 


integral exists and that the location where the derivative is zero is a 
mazimum. 


(v) There exists a function H(x) such that f H(x)f(x;0)dz < oo with 


Z In f(x;0)| < H(x). This makes sure that the population is not 
overpopulated with regard to extreme values. 


Then the following results hold: 


(a) As n — o, the probability that the likelihood equation [L'(0) = 0] has a 
solution goes to 1. 


(b) As n — œ, the distribution of the maximum likelihood estimator În 
converges to a normal distribution with mean @ and variance such that 
I(0) Var(6n) — 1, where 


1(6) -ne | % =n f(X; n| = -n | fens Nem 2 ng 6) dx 


ne | (Jas) l =n | f(e:0) - D) dz. 


For any z, the last statement is to be interpreted as 


. =o sais 
are (wap < z) =a 


and therefore [I(0)|~! is a useful approximation for Var(ĝ„). The quantity 
I(@) is called the information (sometimes more specifically, Fisher’s infor- 
mation). It follows from this result that the maximum likelihood estimator 
is asymptotically unbiased and consistent. The conditions in statements (i)- 
(v) are often referred to as “mild regularity conditions.” A skeptic would 
translate this statement as “conditions that are almost always true but are 
often difficult to establish, so we’ll just assume they hold in our case.” Their 
purpose is to ensure that the density function is fairly smooth with regard to 
changes in the parameter and that there is nothing unusual about the density 
itself. 

The results stated above assume that the sample consists of independent 
and identically distributed random observations. A more general version of 


ll 


ŝFor an example of a situation where these conditions do not hold, see Exercise 12.55. 
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the result uses the logarithm of the likelihood function: 


1(6) = -E ro) =E (pH) | 


The only requirement here is that the same parameter value apply to each 
observation. 

If there is more than one parameter, the only change is that the vector 
of maximum likelihood estimates now has an asymptotic multivariate normal 
distribution. The covariance matrix® of this distribution is obtained from the 
inverse of the matrix with (r, s)th element, 


nE | mE 0) 


= e | age) 35-10) =nE E + In f(X;0)— aan in (X50) 


W = -EgO =- 


The first expression on each line is always correct. The second expression 
assumes that the likelihood is the product of n identical densities. This ma- 
trix is often called the information matrix. The information matrix also 
forms the Cramér-Rao lower bound. That is, under the usual conditions, no 
unbiased estimator has a smaller variance than that given by the inverse of 
the information. Therefore, at least asymptotically, no unbiased estimator is 
more accurate than the maximum likelihood estimator. 


Example 12.14 Estimate the covariance matriz of the maximum likelihood 
estimator for the lognormal distribution. Then apply this result to Data Set 
B. 


The likelihood function and its logarithm are 


= 1 ln z; ~ u)? 
L(y, 0) — ame | : =H) ls 
vay LITV AT 40 
< lng ? 
I(u, 0) == 5 -inz — ma — 5 n(n) — 5 (==) $ 
j=1 


The first partial derivatives are 


°For any multivariate random variable the covariance matrix has the variances of the indi- 
vidual random variables on the main diagonal and covariances in the off-diagonal positions. 
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The second partial derivatives are 


wn 

ðL o?’ 

l “nz; —p 
ðsðp es a o’ 

al n A (Ina; — p)? 
pe ee ae 


The expected values are (ln X; has a normal distribution with mean p and 
standard deviation c) 


` Changing the signs and inverting produce an estimate of the covariance matrix 
(it is an estimate because Theorem 12.13 only provides the covariance matrix 
in the limit). It is 


Z o 
n 

z 
0 a 


For the lognormal distribution, the maximum likelihood estimates are the 
solutions to the two equations 


T hgrj-u n < (lz; -u) 
> a = 0 and Soto oe 


j=1 j=l 


From the first equation Ê = (1/n) eee Inz;, and from the second equation 
ê? = (1/n) doje (in zj — ft)”. For Data Set B the values are ji = 6.1379 and 
&? = 1.9305 or & = 1.3894. With regard to the covariance matrix the true 
values are needed. The best we can do is substitute the estimated values to 
obtain 

0.0965 0 


Valh, ê) = 2 
ern) | 0 0.0483 |° (r 
The multiple “hats” in the expression indicate that this is an estimate of the 


variance of the estimators. O 
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The zeros off the diagonal indicate that the two parameter estimates are 
asymptotically uncorrelated. For the particular case of the lognormal distri- 
bution, that is also true for any sample size. One thing we could do with this 
information is construct approximate 95% confidence intervals for the true 
parameter values. These would be 1.96 standard deviations on either side of 
the estimate: 


p : 6.1379 + 1.96(0.0965)!/? = 6.1379 + 0.6089, 
o : 1.3894 + 1.96(0.0483)!/? = 1.3894 + 0.4308. 


To obtain the information matrix, it is necessary to take both derivatives 
and expected values. This is not always easy to do. A way to avoid this 
problem is to simply not take the expected value. Rather than working with 
the number that results from the expectation, use the observed data points. 
The result is called the observed information. 


Example 12.15 Estimate the covariance in the previous example using the 
observed information. 


Substituting the observations into the second derivatives produces 


al SE ae 
Ou? T g? gg? 
L o 25 oe -H _ _o122-1576 — 20u 
sð rE o a i 
ol n 2 (lng; — p)? 20 „792.0801 — 245.51524 + 204? 
fe ee 3) > ee a al en E 
ðo? g? = gt o .o4 i 


Inserting the parameter estimates produces the negatives of the entries of the 
observed information, 


a a 4 at 


Ope = —10.3600, Bo Ou = 0, 32 


= —20.7190. 
Changing the signs and inverting produce the same values as in (12.1). This is 
a feature of the lognormal distribution that need not hold for other models. O 


Sometimes it is not even possible to take the derivative. In that case an 
approximate second derivative can be used. A reasonable approximation is 


FO) . 1 


86,00, $ Tigh (Ot aiei + ghje) — f(O + griei — bhje;) 


-(0 — bhii + bhyej) + F(0 — bhei — byes) 


| 
| 
| 
| 
| 
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where e; is a vector with all zeros except for a 1 in the ith position and 
hi = 0;/10", where v is one-third the number of significant digits used in 
calculations. 


Example 12.16 Repeat the previous example using approximate derivatives. 


Assume that there are 15 significant digits being used. Then hı = 6.1379/10° 
and hg = 1.3894/10°. Reasonably close values are 0.00006 and 0.00001. The 
first approximation is 


Pl 1(6.13796, 1.3894) — 21(6.1379, 1.3894) + 1(6.13784, 1.3894) 

ôL (0.00006)2 
_ —157.71389308198 — 2(—157.71389304968) + (—157.71389305468) 
T (0.00006)? 
= —10.3604. 


The other two approximations are 


3? 071 
—— =O. ae = —20.7208. 
Da Ou 0.0003, 5 0.7208 
We see that here the approximation works very well. 0 


The information matrix provides a method for assessing the quality of the 
maximum likelihood estimators of a distribution’s parameters. However, we 
are often more interested in a quantity that is a function of the parameters. 
For example, we might be interested in the lognormal mean as an estimate of 
the population mean. That is, we want to use exp(ji + 67/2) as an estimate 
of the population mean, where the maximum likelihood estimates of the pa- 
rameters are used. It is very difficult to evaluate the mean and variance of 
this random variable because it is a complex function of two variables that 
already have complex distributions. The following theorem (from [108]) can 
help. The method is often called the delta method. 


Theorem 12.17 Let Xn = (Xin,..-,Xkn)? be a multivariate random vari- 
able of dimension k based on a sample of size n. Assume that X is asymptot- 
ically normal with mean @ and covariance matriz S/n, where neither @ nor 
E depend onn. Let g be a function of k variables that is totally differentiable. 
Let Gn = g(Xin,.--,Xkn). Then Gn is asymptotically normal with mean 
g(0) and variance (Og)" X(Og)/n, where Og is the vector of first derivatives, 
that is, Og = (0g/001,...,99/00,)" and it is to be evaluated at 0, the true 
parameters of the original random variable. 


The statement of the theorem is hard to decipher. The Xs are the estima- 
tors and g is the function of the parameters that are being estimated. For a 
model with one parameter, the theorem reduces to the following statement: 
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Let 6 be an estimator of 6 that has an asymptotic normal distribution with 
mean @ and variance o?/n. Then g(@) has an asymptotic normal distribution 
with mean g(@) and asymptotic variance (g'(0)](a?/n)(g'(0)] = 9! (8)?0?/n. 


Example 12.18 Use the delta method to approrimate the variance of the 
mazimum likelihood estimator of the probability that an observation from an 
exponential distribution exceeds 200. Apply this result to Data Set B. 


From Example 12.8 we know that the maximum likelihood estimate of 
the exponential parameter is the sample mean. We are asked to estimate 
p = Pr(X > 200) = exp(—200/0). The maximum likelihood estimate is 
p = exp(—200/6) = exp(—200/z). Determining the mean and variance of 
this quantity is not easy. But we do know that Var(X) = Var(X)/n = 6?/n. 
Furthermore, 

g(0) = e™ 200/0. g'(0) a 200072 e7200/0 


and therefore the delta method gives 


(2000~*e~00/9)?4? _ 40,00067?e—490/8 


Var(p) = a 
For Data Set B, 
z= 1,424.4, 
200 
i Vis a = 0.869 
P exp ( a) ke 
ae 24.4)~? — 24. 
os OO PA pA = 0,0007444. 


‘A 95% confidence interval for p is 0.869 + 1.96./0.0007444 or 0.869 + 0.053. 0 


Example 12.19 Construct a 95% confidence interval for the mean of a log- 
normal population using Data Set B. Compare this to the more traditional 
confidence interval based on the sample mean. 


From Example 12.14 we have ĝ = 6.1379 and G = 1.3894 and an estimated 
covariance matrix of 


S [ 0.0965 0 
n | 0 0.0483 


The function is g(u,0) = exp(u + 07/2). The partial derivatives are 


Og _ 1,2 
a = exp (u+ 507) 
Og _ 1,2 
a. a exp (u+ 50°) 
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and the estimates of these quantities are 1,215.75 and 1,689.16, respectively. 
The delta method produces the following approximation: 


Var[g(a,6)] = [ 1,215.75 1,689.16 Í 
280,444. 


0.0965 0 1,215.75 ` 
0 0.0483 1,689.16 


lI 


The confidence interval is 1,215.75 + 1.96,/280, 444 or 1,215.75 + 1,037.96. 
The customary E interval for a population mean is Z + 1.96s/yn 
where is s? is the sample variance. For Data Set B the interval is 1,424.4 + 
1.96(3,435.04)/v/20 or 1,424.4 + 1,505.47. It is not surprising that this is a 
wider interval because we know that (for a lognormal population) the maxi- 
mum likelihood estimator is asymptotically UMVUE. O 


12.3.1 Exercises 


12.54 Determine 95% confidence intervals for the parameters of exponential 
and gamma models for Data Set B. The likelihood function and maximum 
likelihood estimates were determined in Example 12.8. 


12.55 Let X have a uniform distribution on the interval from 0 to 8. Show 
that the maximum likelihood estimator is ĝ = max(Xi,.. Xn). Use Exam- 
ples 9.7 and 9.10 to show that this estimator is sepnptorically unbiased and 
to obtain its variance. Show that Theorem 12.13 yields a negative estimate 
of the variance and that item (ii) in the conditions does not hold. 


12.56 Use the delta method to construct a 95% confidence interval for the 
mean of a gamma distribution using Data Set B. Preliminary calculations are 
in Exercise 12.54. 


12.57 (*) For a lognormal distribution with parameters pz and ø you are given 
that the maximum likelihood estimates are à = 4.215 and ĉ = 1.093. The 
estimated covariance matrix of (ji, ĉ) is 


0.1195 0 
0 0.0597 


The mean of a lognormal distribution is given by exp(u+o?/2). Estimate the 
variance of the maximum likelihood estimator of the mean of this lognormal 
distribution using the delta method. 


12.58 (*) A distribution has two parameters, a and §. A sample of size 10 
produced the following loglikelihood function: 


l(a, B) = —2.507 — 308 — 6? + 50a +26 +k, 


where k is a constant. Estimate the covariance matrix of the maximum like- 
lihood estimator (â, 8). 
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12.59 In Exercise 12.39 two maximum likelihood estimates were obtained for 
the same model. The second estimate was based on more information than 
the first one. It would be reasonable to expect that the second estimate is 
more accurate. Confirm this by estimating the variance of each of the two 
estimators. Do your calculations using the observed likelihood. 


12.60 This is a continuation of Exercise 12.43. Let 21,...,2, be a random 
sample from a population with cdf F(z) =P, O0<a<l1. 


(a) Determine the asymptotic variance of the maximum likelihood es- 
timator of p. 

(b) Use your answer to obtain a general formula for a 95% confidence 
interval for p. 


(c) Determine the maximum likelihood estimator of E(X) and obtain 
its asymptotic variance and a formula for a 95% confidence interval. 


12.61 This is a continuation of Exercise 12.46. Let £z1,..., £n be a random 
sample from a population with pdf f(z) = @~*e~*/9, z > 0. 


(a) Determine the asymptotic variance of the maximum likelihood es- 
timator of 8. 

(b) (*) Use your answer to obtain a general formula for a 95% confi- 
dence interval for @. 


(c) Determine the maximum likelihood estimator of Var(X) and obtain 
its asymptotic variance and a formula for a 95% confidence interval. 


12.62 (*) A sample of size 40 has been taken from a population with pdf 
f(x) = (270) 1/2e77/20), 00 < £ < co, 0 > 0. The maximum likelihood 
astimate of 8 is Ê = 2. Approximate the MSE of 6. 


12.63 Four observations were made from a random variable having the den- 
sity function f(x) = 2Aze~** , x, \ > 0. Exactly one of the four observations 
was less than 2. 


(a) (*) Determine the maximum likelihood estimator of À. 


(b) Approximate the variance of the maximum likelihood estimator of 


12.64 Estimate the covariance matrix of the maximum likelihood estimators 
for the data in Exercise 12.44 with both a and @ unknown. Do this by com- 
puting approximate derivatives of the loglikelihood. Then construct a 95% 
confidence interval for the mean. 


12.65 Estimate the variance of the maximum likelihood estimator for Exer- 
cise 12.49 and use it to construct a 95% confidence interval for E(X A 500). 
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12.66 Consider a random sample of size n from a Weibull distribution: For 
this exercise, write the Weibull survival function as l 


j T 
L 
For this exercise, assume that 7 is known and that only p is to be estimated. 
(a) Show that E(X) = u. 
(b) Show that the maximum likelihood estimate of p is 


1 7 
B= +r) |= 2% 
n 


(c) Show that using the observed information produces the variance 
estimate . 


Ê 
nt? 


Var(a) = 
where u is replaced by ĝ. 
(d 


~~ 


Show that using the information (again replacing p with jt) pro- 
duces the same variance estimate as in part (c). 


(e 


wo 


Show that ji has a transformed gamma distribution with a = n, 8 = 
uni", and r = 7. Use this to obtain the exact variance of Ê 
(as a function of 4). Hint - The variable X7 has an exponential 
distribution and so the variable eee X; has a gamma distribution 
with first parameter equal to n and second parameter equal to the 
mean of the exponential distribution. 


12.4 BAYESIAN ESTIMATION 


All of the previous discussion on estimation has assumed a frequentist ap- 
proach. That is, the population distribution is fixed but unknown, and our 
decisions are concerned not only with the sample we obtained from the pop- 
ulation but also with the possibilities attached to other samples that might 
have been obtained. The Bayesian approach assumes that only the data ac- 
tually observed are relevant and it is the population that is variable. For 
parameter estimation the following definitions describe the process and then 
Bayes’ theorem provides the solution. 


12.4.1 Definitions and Bayes’ theorem 


Definition 12.20 The prior distribution is a probability distribution over 
the space of possible parameter values. It is denoted (0) and represents our 
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opinion concerning the relative chances that various values of 0 are the true 
value of the parameter. 


As before, the parameter 9 may be scalar or vector valued. Determination 
of the prior distribution has always been one of the barriers to the wide- 
spread acceptance of Bayesian methods. It is almost certainly the case that 
your experience has provided some insights about possible parameter values 
before the first data point has been observed. (If you have no such opin- 
ions, perhaps the wisdom of the person who assigned this task to you should 
be questioned.) The difficulty is translating this knowledge into a probabil- 
ity distribution. An excellent discussion about prior distributions and the 
foundations of Bayesian analysis can be found in Lindley [83], and for a dis- 
cussion about issues surrounding the choice of Bayesian versus frequentist 
methods, see Efron [32]. The book by Klugman [77] contains more detail on 
the Bayesian approach along with several actuarial applications. More re- 
cent articles applying Bayesian methods to actuarial problems include [25], 
[101], [119], and [133]. A good source for a thorough mathematical treatment 
of Bayesian methods is the text by Berger [13]. In recent years many ad- 
vancements in Bayesian calculations have occurred. A good resource is [22]. 
Scollnik [118] has demonstrated how the computer program WINBUGS can 
be used to provide Bayesian solutions to actuarial problems. 

Due to the difficulty of finding a prior distribution that is convincing (you 
will have to convince others that your prior opinions are valid) and the pos- 
sibility that you may really have no prior opinion, the definition of prior 
distribution can be loosened. 


Definition 12.21 An improper prior distribution is one for which the 
probabilities (or pdf) are nonnegative but their sum (or integral) is infinite. 


` A great deal of research has gone into the determination of a so-called 

noninformative or vague prior. Its purpose is to reflect minimal knowledge. 

Universal agreement on the best way to construct a vague prior does not exist. 

However, there is agreement that the appropriate noninformative prior for a 

scale parameter is 7(9) = 1/0, > 0. Note that this is an improper prior. 
For a Bayesian analysis, the model is no different than before. 


Definition 12.22 The model distribution is the probability distribution for 
the data as collected given a particular value for the parameter. Its pdf is 
denoted fxj@(x|@), where vector notation for x is used to remind us that all 
the data appear here. Also note that this is identical to the likelihood function 
and so that name may also be used at times. 


If the vector of observations x = (x1,...,£n)* consists of independent and 
identically distributed random variables, then 


Fxjo(x|9) = fxje(t119) --- fxjo (£718). 
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We use concepts from multivariate statistics to obtain two more definitions. 
In both cases, as well as in the following, integrals should be replaced pry sums 
if the distributions are discrete. 


Definition 12.23 The joint distribution has pdf 


fx,e(x, #) = fxjo(x|9)7(8). 
Definition 12.24 The marginal distribution of x has pdf 


f(x) = i Fxjo(xl0)m(6) d. 


Compare this definition to that of a mixture distribution given by (4.4) on 
page 59. The final two quantities of interest are the following. 


Definition 12.25 The posterior distribution is the conditional probability 
distribution of the parameters given the observed data. It is denoted Tex (|x). 


Definition 12.26 The predictive distribution is the conditional proba- 
bility distribution of a new observation y given the data x. It is denoted 
fyi (lz) 


These last two items are the key output of a Bayesian analysis. The pos- 
terior distribution tells us how our opinion about the parameter has changed 
once we have observed the data. The predictive distribution tells us what 
the next observation might look like given the information contained in the 
data (as well as, implicitly, our prior opinion). Bayes’ theorem tells us how 
to compute the posterior distribution. 


Theorem 12.27 The posterior distribution can be computed as 


Fxjo(x|0)x(8) 


Tex (O[x) = (12.2) 
J fio l0)m(0) 4a 
while the predictive distribution can be computed as 
Fric(uls) = f fryelul®yrepx(Obx) a, (12.3) 


where fyjo(y|@) is the pdf of the new observation, given the parameter value. 


10In this section and in any subsequent Bayesian discussions, we reserve f(-) for distribu- 
tions concerning observations (such as the model and predictive distributions) and z(-) for 
distributions concerning parameters (such as the prior and posterior distributions). The 
arguments will usually make it clear which particular distribution is being used. To make 
matters explicit, we also employ subscripts to enable us to keep track of the random vari- 
ables. 


| 
| 
| 
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The predictive distribution can be interpreted as a mixture distribution 
where the mixing is with respect to the posterior distribution. The following 
example illustrates the above definitions and results. The setting, though not 
the data, is taken from Meyers [92]. 


Example 12.28 The following amounts were paid on a hospital liability pol- 
icy: 


125 132 141 107 133 319 126 104 145 223 


The amount of a single payment has the single-parameter Pareto distribution 
with @ = 100 and œ unknown. The prior distribution has the gamma distribu- 
tion with a = 2 and 0 = 1. Determine all of the relevant Bayesian quantities. 


The prior density has a gamma distribution and is 
(a) = ae“, a>QO, 
while the model is (evaluated at the data points) 
a!0(190) 102 
(ee, agtt) 


The joint density of x and A is (again evaluated at the data points) 


fxja(xla) z p œe 3-801121a—49.852823 


fx a(x, a) = =Q 11,—4. -801121a—49. Saanaa, 


The posterior distribution of œ is 


aite—4.801121a0—49.852823 11,,—4.801121a 


& e 


Talx(alx) = JE a e748011210—49.852823 do i (11!)(1/4.801121)?2 (12.4) 


There is no need to evaluate the integral in the denominator. Because we 
know that the result must be a probability distribution, the denominator is 
just the appropriate normalizing constant. A look at the numerator reveals 
that we have a gamma distribution with a = 12 and 6 = 1/4.801121. 

The predictive distribution is 


© «1002 atle—4.801121la 
J, yet (111) (1/4.801121) 2 
1 © iz 
aoaaa |. a. e~ (0.195951-+1n y) da 
1 (12!) 
y(11!)(1/4.801121)!? (0.195951 + In y)? 
12(4.801121)” 


= ¥(0.195951 + ny)? 100. ie: 


While this density function may not look familiar, you are asked to show in 
Exercise 12.67 that In Y — ln 100 has the Pareto distribution. Oo 


fyix(ylx) da 
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12.4.2 Inference and prediction 


In one sense the analysis is complete. We begin with a distribution that 
quantifies our knowledge about the parameter and/or the next observation 
and we end with a revised distribution. But we have a suspicion that your 
boss may not be satisfied if you produce a distribution in response to his or 
her request. No doubt a specific number, perhaps with a margin for error, is 
what is desired. The usual Bayesian solution is to pose a loss function. 


Definition 12.29 A loss function l; (63, 9;) describes the penalty paid by 


the investigator when 6; is the estimate and 0; is the true value of the jth 
parameter. 


It would also be possible to have a multidimensional loss function 1(, 8) 
which allowed the loss to depend simultaneously on the errors in the various 
parameter estimates. 


Definition 12.30 The Bayes estimate for a given loss function is the one 
that minimizes the expected loss given the posterior distribution of the para- 
meter in question. 


The three most commonly used loss functions are defined as follows. 


Definition 12.31 For squared-error loss the loss function is (all subscripts 
are dropped for convenience) 1(6,0) = (@—)?. For absolute loss it is 
1(6,0) = | — 8|. For zero-one loss it is (6,0) = 0 if 8 = 0 and is 1 
otherwise. 


The following theorem indicates the Bayes estimates for these three com- 
mon loss functions. 


Theorem 12.32 For squared-error loss the Bayes estimate is the mean of 
the posterior distribution, for absolute loss it is a median, and for zero—one 
loss it is a mode. 


Note that there is no guarantee that the posterior mean exists or that the 
posterior median or mode will be unique. When not otherwise specified, the 
term Bayes estimate will refer to the posterior mean. 


Example 12.33 (Example 12.28 continued) Determine the three Bayes esti- 
mates of a. 


The mean of the posterior gamma distribution is a? = 12/4.801121 = 
2.499416. The median of 2.430342 must be determined numerically while the 
mode is (a — 1)9 = 11/4.801121 = 2.291132. Note that the a used here is 
the parameter of the posterior gamma distribution, not the a for the single- 
parameter Pareto distribution that we are trying to estimate. O 
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For forecasting purposes, the expected value of the predictive distribution 
is often of interest. It can be thought of as providing a point estimate of the 
(n+1)th observation given the first n observations and the prior distribution. 
It is 


E(Y |x) 


f vivcctuboay 
[vf trretl)rorx(Olx)atay 
f role) f vfvio(l8)avas 


/ E(Y [8)meyx(Olx) dd. (12.6) 


Equation (12.6) can be interpreted as a weighted average using the posterior 
distribution as weights. 


Example 12.34 (Example 12.28 continued) Determine the expected value of 
the 11th observation, given the first 10. 


For the single-parameter Pareto distribution, E(Y|a) = 100a/(a — 1) for 
a > 1. Because the posterior distribution assigns positive probability to values 
of a <1, the expected value of the predictive distribution is not defined. O 


The Bayesian equivalent of a confidence interval is easy to construct. The 
following definition will suffice. 


Definition 12.35 The points a < b define a 100(1—a)% credibility inter- 
val for 0; provided that Pr(a < O; < bjx) > 1—a. 


The use of the term credibility has no relationship to its use in actuarial 
analyses as developed in Chapter 16. The inequality is present for the case 
where the posterior distribution of 0; is discrete. Then it may not be possible 
for the probability to be exactly 1 — a. This definition does not produce a 
unique solution. The following theorem indicates one way to produce a unique 
interval. 


Theorem 12.36 Jf the posterior random variable 0;|x is continuous and uni- 
modal, then the 100(1 — a)% credibility interval with smallest width b — a is 
the unique solution to 


1l—a, 


Il 


b 
f resist) 8; 
Teix(alx) = Tex(b[x). 


This interval is a special case of a highest posterior density (HPD) credibility 
set. 
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0.7 

06 ~—— HPD interval 

0.5 ~ Equal-probability interval 
R 0.4 


Fig. 12.1 Two Bayesian credibility intervals. 


The following example may clarify the theorem. 


Example 12.37 (Example 12.28 continued) Determine the shortest 95% cred- 
ibility interval for the parameter a. Also determine the interval that places 
2.5% probability at each end. 


The two equations from Theorem 12.36 are 


Pr(a < A < bjx) T(12; 4.801121b) — T(12; 4.801121a) = 0.95, 
11 „—4.801121a 


alte = pile —4-801121b 


and numerical methods can be used to find the solution a = 1.1832 and 
b = 3.9384. The width of this interval is 2.7552. 
Placing 2.5% probability at each end yields the two equations 


T(12; 4.8011216) = 0.975, T(12;4.801121la) = 0.025. 


This solution requires either access to the inverse of the incomplete gamma 
function or the use of root-finding techniques with the incomplete gamma 
function itself. The solution is a = 1.2915 and b = 4.0995. The width is 
2.8080, wider than the first interval. Figure 12.1 shows the difference in the 
two intervals. The solid vertical bars represent the HPD interval. The total 
area to the left and right of these bars is 0.05. Any other 95% interval must 
also have this probability. To create the interval with 0.025 probability on each 
side, both bars must be moved to the right. To subtract the same probability 
on the right end that is added on the left end, the right limit must be moved a 
greater distance because the posterior density is lower over that interval than 
it is on the left end. This must lead to a wider interval. O 


The following definition provides the equivalent result for any posterior 
distribution. 


esos eh ny. enn tpi mre UPN 


i 
i 
| 
i 
E 
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Definition 12.38 For any posterior distribution the 100(1-a)% HPD cred- 
ibility set is the set of parameter values C such that 


Pr(6; €C) >1—a (12.7) 


and 
C = {0; : 70,\x(O;|x) = c} for some c, 


where c is the largest value for which the inequality (12.7) holds. 


This set may be the union of several intervals (which can happen with a 
multimodal posterior distribution). This definition produces the set of mini- 
mum total width that has the required posterior probability. Construction of 
the set is done by starting with a high value of c and then lowering it. As it 
decreases, the set C gets larger, as does the probability. The process contin- 
ues until the probability reaches 1 — a. It should be obvious to see how the 
definition can be extended to the construction of a simultaneous credibility 
set for a vector of parameters, @. 

Sometimes it is the case that, while computing posterior probabilities is 
difficult, computing posterior moments may be easy. We can then use the 
Bayesian central limit theorem. The following is a paraphrase from Berger 
[13], p. 224. 


Theorem 12.39 If (6) and fxjo(x|@) are both twice differentiable in the el- 
ements of 8 and other commonly satisfied assumptions hold, then the posterior 
distribution of © given X = x is asymptotically normal. 


The “commonly satisfied assumptions” are like those in Theorem 12.13. As 
in that theorem, it is possible to do further approximations. In particular, the 
asymptotic normal distribution also results if the posterior mode is substituted 
for the posterior mean and/or if the posterior covariance matrix is estimated 
by inverting the matrix of second partial derivatives of the negative logarithm 
of the posterior density. 


Example 12.40 (Example 12.28 continued) Construct a 95% credibility in- 
terval for a using the Bayesian central limit theorem. 


The posterior distribution has a mean of 2.499416 and a variance of ae? = 
0.520590. Using the normal approximation, the credibility interval is 2.499416 
1.96(0.520590)!/?, which produces a = 1.0852 and b = 3.9136. This interval 
(with regard to the normal approximation) is HPD due to the symmetry of 
the normal distribution. 

The approximation is centered at the posterior mode of 2.291132 (see Ex- 
ample 12.33). The second derivative of the negative logarithm of the posterior 
density [from (12.4)] is 


dæ aile4-80ll2ia 11 
~ doz | 2° 


ae 
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The variance estimate is the reciprocal. Evaluated at the modal estimate of 
œ we get (2.291132)?/11 = 0.477208 for a credibility interval of 2.29113 + 
1.96(0.477208)!/?, which produces a = 0.9372 and b = 3.6451. a 


The same concepts can apply to the predictive distribution. However, 
the Bayesian central limit theorem does not help here because the predictive 
sample has only one member. The only potential use for it is that for a large 
original sample size we can replace the true posterior distribution in (12.3) 
with a multivariate normal distribution. 


Example 12.41 (Example 12.28 continued) Construct a 95% highest density 
prediction interval for the next observation. 


It is easy to see that the predictive density function (12.5) is strictly de- 
creasing. Therefore the region with highest density runs from a = 100 to b. 
The value of b is determined from 


b 12 
2(4. 
Jas / 12(4.801121) 
1 


oo y(0.195951 + In y)13 Yy 
jones 12(4.801121)22 
0 (4.801121 +2) 7 


= 4.801121 A 
E 4.801121 + In(b/100) 


and the solution is b = 390.1840. It is interesting to note that the mode of 
the predictive distribution is 100 (because the pdf is strictly decreasing) while 
the mean is infinite (with b = oo and an additional y in the integrand, after 
the transformation, the integrand is like e*x~!8, which goes to infinity as z 
goes to infinity). Oo 


The following example revisits a calculation done in Section 4.6.3. There 
the negative binomial distribution was derived as a gamma mixture of Poisson 
variables. The following example shows how the same calculations arise in a 
Bayesian context. 


Example 12.42 The number of claims in one year on a given policy is known 
to have a Poisson distribution. The parameter is not known, but the prior 
distribution has a gamma distribution with parameters a and 0. Suppose in 
the past year the policy had x claims. Use Bayesian methods to estimate the 
number of claims in the neat year. Then repeat these calculations assuming 
claim counts for the past n years, £1, ..., Tn- 


i 
| 
| 
, 
| 
| 


| 
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The key distributions are (where z = 0,1,..., A,a,4 > 0): 


7 Now e-A/8 
Prior: mA) = TaJ 
Te 
Model: p(z|A) = = : 
jeto-le—(141/8)A | 
Joint: p(z, à) = zT (ayo™ 
oo Ttall o—(14+1/0)A a 
Marginal: p(z) = f To 
earne E L O 
~ alT(a)O%(1 + 1/8)7+e 
_ tta-l 1 i 8 3 
A T 1+8 1+8 
, Azta- l o (1+1/0)A T(x + a) 
Posterior: T(àjz) = mO | mart + 1/0)ete 
Ree Pe + 1/0)7+e 


T(z +a) 
The marginal distribution is negative binomial with r = a and 6 = 8. The 
posterior distribution is gamma with shape parameter “a” equal to z+ a and 
scale parameter “8” equal to (1 +1/6)~1 = 0/(1 +8). The Bayes estimate 
of the Poisson parameter is the posterior mean, (x + a)@/(1+@). For the 
predictive distribution, (12.3) gives 


dx 


© \Ye-A Dilla e—(1+1/0)A(] + 1/0)7+e 

(1+1/0)7+* [° jyteta-l oT (2+1/0)A dA 
yT(z+a) Jo 

(1+1/80)+°T (y +r +a) 


a dO AWT tits 
yT + a)(2+ 10e I 


p(ylz) 


and some rearranging shows this to be a negative binomial distribution with 
r=x+aand 8 = 0/(1 +0). The expected number of claims for the next 
year is (x + a)6/(1+0). Alternatively, from (12.6), 


E(Y|z) = i 


For a sample of size n, the key change is that the model distribution is now 


oo Va ee + 1/0)7+° wie (z + a)0 
T(z +a) 1+6 ` 


jet ten enna 


p(x|A) = mil- En! 


370 PARAMETER ESTIMATION 


Following this through, the posterior distribution is still gamma, now with 
shape parameter 21 +--+ Tn +a = n +a and scale parameter 6/(1-+n6). 
The predictive distribution is still negative binomial, now with r = n= +a 
and 8 = 0/(1 + n8). o 


When only moments are needed, the double-expectation formulas can be 
very useful. Provided the moments exist, for any random variables X and Y, 


E(Y) = E[E(Y|X)], (12.8) 
Var(Y) = E[Var(Y|X)] + Var[E(Y|X)}. (12.9) 
For the predictive distribution, 
E(Y|x) = Eg){E(Y|©,x)] 
= Eg)x[E(Y|®)] 
and 
Var(¥ x) = Boje[Var(¥|0,x)] + Varojx(B(V10,)] 


= Eojx[Var(Y10)] + Varejx[E(Y10)]. 


The simplification on the inner expected value and variance results from the 
fact that, if © is known, the value of x provides no additional information 
about the distribution of Y. This is simply a restatement of (12.6). 


Example 12.43 Apply these formulas to obtain the predictive mean and vari- 


ance for the previous example. Then anticipate the credibility formulas of 
Chapter 16. 


The predictive mean uses E(Y|A) = A. Then, 

(nz + a) 
1l+n6 ` 

The predictive variance uses Var(Y|A) = 4, and then 


Var(Y|x) = E(A|x) + Var(A[x) 
(né+a)0  (né+a0)6? 
1+n0 (1+ n0)? 


iy ET 8 8 
= (nt +a) (1+ 75). 


These agree with the mean and variance of the known negative binomial 


distribution for y. However, these quantities were obtained from moments 


of the model (Poisson) and posterior (gamma) distributions. The predictive 
mean can be written as 


E(Y |x) = E(\|x) = 


Rise Wee ane 
1+n86 Lng” 


BAYESIAN ESTIMATION 371 


which is a weighted average of the mean of the data and the mean of the 
prior distribution. Note that as the sample size increases more weight is 
placed on the data and less on the prior opinion. The variance of the prior 
distribution can be increased by letting @ become large. As it should, this also 
increases the weight placed on the data. The credibility formulas in Chapter 
16 generally consist of weighted averages of an estimate from the data and a 
prior opinion. m 


12.4.3 Conjugate prior distributions and the linear exponential family 


A large parametric family that includes many of the distributions we have 
encountered so far has a special use in Bayesian analysis. The definition is as 
follows: 


Definition 12.44 A random variable X (discrete or continuous) has a dis- 
tribution which is from the linear exponential family if its pf may be pa- 
rameterized in terms of a parameter 0 and expressed as 


py — Pe 
f(238) = ay (12.10) 


The function p(x) depends only on x (not on 0), and the function q(@) is a nor- 
malizing constant. Also, the support of the random variable must not depend 
on 6. The parameter 8 is called the natural parameter of the distribution. 


Example 12.45 Show that the exponential distribution is of the form (12.10). 


The pdf is 
=í 
fæ p) = pee, 
If we let 0 = 1/2, then the pdf is 


le = 
f(z; 8) = ar 
which is of the form (12.10) with p(x) = 1 and q(@) = 1/0. Oo 


Example 12.46 Show that the Poisson distribution is a member of the linear 
exponential family. 


The pf is 
eò (1/s)e7 Ae 
f (z; à) = x! = eà È 
If we let 0 = — ln À, then the pf is 


(1/a!)e"9 


f(z;8) = 


? 
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which is of the form (12.10) with p(x) = 1/2! and q(0) = e® ’. Note that in 
this parameterization the Poisson mean is e~?. 0 


Example 12.47 Show that the normal distribution with mean p and known 
variance v is a member of the linear exponential family. 


The pdf is 
Fle) = Orv) em |- le- 
á 2 
= (2x —=1/2 T Peles D 
(ara) x0 ( 2v v 2v 


If we let 6 = —p/v, the pdf is 


w 


(2v) 


F(z;0,v) = 


which is of the form (12.10) with p(x) = (27v) 1 exp[—z?/(2v)], and q(0) = 
exp(0"v/2). 


We now find the mean and variance of the distribution defined by (12.10). 
First, note that 


In f(x; 6) = In p(x) — Oa — Ing(6). 
Differentiate with respect to @ to obtain 


f2) ‘(0 
a f(a) = [-« — Ol f(x; 6). (12.11) 


Integrate (or sum) over the range of z (known not to depend on 8) to obtain 
ð ‘(0 
| ie dz = - | afla; 0) dx — oa | fe 6) dz. 


On the left-hand side, interchange the order of differentiation and integration 
(or summation) to obtain 


5a | J £28) ar " - | 2F(2:6) da- 2) LP f #(ei8) ae 
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We know that f f(x; 0) dz = 1 and faf(x;@) dx = E(X) and thus 


‘(0 d 
Ay SO) St ai (12.12) 
To obtain the variance, (12.11) may first be rewritten as 


F(a; 8) = -le — uO) Fai 0). 


Differentiate again with respect to 0 to obtain 


a! (0) F038) — le - wl) FH 8) 
a! (0) F058) + le — WO) Fe 8). 


Again, integrate over the range of x to obtain 


82 


Il 


[pie dz = KO) | Fe 8) dz + fe = LOPIE) ax 


In other words 


2 j a 
[e-MO ax = —n'(0) +e f Iae 


Because u(@) is the mean, the left-hand side is the variance (by definition), 


and then because the second term on the right-hand side is zero, we obtain 


Var(X) = (0) = —p’ (9) = (12.13) 


# 


In Example 12.42 it turned out the posterior distribution was of the same 
type as the prior distribution (gamma). This makes calculations relatively 
easy. A definition of this concept follows. 


Definition 12.48 A prior distribution is said to be a conjugate prior dis- 
tribution for a given model if the resulting posterior distribution is of the 
same type as the prior (but perhaps with different parameters). 


The following theorem shows that, if the model is a member of the linear 
exponential family, a conjugate prior distribution is easy to find. 
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Theorem 12.49 Suppose that given © = 8 the random variables X1,...,Xn 
are independent and identically distributed with pf 


l _ p(x5)e7 95 
Fx;;o(4318) q(0) ’ 


where © has pdf 
—k,—Opk 
(6) = [g(@)|“Ne~"# . 
c(a, k) 
where k and u are parameters of the distribution and c(p, k) is the normalizing 


constant. Then the posterior pf Tojx(0|x) is of the same form as 7(8@). 


Proof: The posterior distribution is 


q(9)" c(p, k) 


a AOJE exp | (MERA) (+m) 


x [g(9)|-* exp(—Ou*k*), 


which is of the same form as 7(@) with parameters 


z (8|x) 


k* = k+n, 
= Hk+} zi k 
if ks, ES te 
k+n k+n k+n o 


Example 12.50 Show that for the Poisson model the conjugate prior as given 
in Theorem 12.49 is the gamma distribution. 


From Example 12.46 we have that q(@) = exp(e7?) and \ = exp(—0). The 
prior as given by the theorem is 


(0) œ [exp(e~®)] i exp(—Oyk). 
Then the prior density for A is 
m(A) œ [exp(A)]“FAPEAT? = HRT EAR 
which is a gamma distribution with a = pk and 0 = 1/k. The term \7* 
appears because it is |d9/d\|, which is needed for the change of variable. O 
12.4.4 Computational issues 


It should be obvious by now that all Bayesian analyses proceed by taking in- 
tegrals or sums. So at least conceptually it is always possible to do a Bayesian 


oe 
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analysis. However, only in rare cases are the integrals or sums easy to do, and 
that means most Bayesian analyses will require numerical integration. While 
one-dimensional integrations are easy to do to a high degree of accuracy, mul- 
tidimensional integrals are much more difficult to approximate. A great deal 
of effort has been expended with regard to solving this problem. A number 
of ingenious methods have been developed. Some of them are summarized in 
Klugman [77]. However, the one that is widely used today is called Markov 
chain Monte Carlo simulation. A good discussion of this method can be found 
in [118] and actuarial applications can be found in [21] and [119]. 

There is another way which completely avoids computational problems. 
This is illustrated using the example (in an abbreviated form) from Meyers 
[92], which also employed this technique. The example also shows how a 
Bayesian analysis is used to estimate a function of parameters. 


Example 12.51 Data were collected on 100 losses in excess of 100,000. The 
single-parameter Pareto distribution is to be used with @ = 100,000 and a 
unknown. The objective is to estimate the layer average severity for the layer 
from 1,000,000 to 5,000,000. For the observations, Senay = 1,208.4354. 


The model density is 


100 
o:(100,000)* 
fxja(xla) = {I oS ati ) 
ai 
j=1 3 
: 100 
= exp|100Ina + 100a ln 100,000 — (a +1) $ ` nz; 
j=l 
100a 
= = = 1908: . 
exp (10ma 175 ,208 1354) 


The density appears in column 3 of Table 12.6. To prevent computer overflow, 
the value 1,208.4354 was not subtracted prior to exponentiation. This makes 
the entries proportional to the true density function. The prior density is 
given in the second column. It was chosen based on a belief that the true 
value is in the range 1-2.5 and is more likely to be near 1.5 than at the ends. 
The posterior density is then obtained using (12.2). The elements of the 
numerator are found in column 4. The denominator is no longer an integral 
but a sum. The sum is at the bottom of column 4 and then the scaled values 
are in column 5. 

We can see from column 5 that the posterior mode is at a = 1.7, as 
compared to the maximum likelihood estimate of 1.75 (see Exercise 12.69). 
The posterior mean of a could be found by adding the product of columns 1 
and 5. Here we are interested in a layer average severity. For this problem it 
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Table 12.6 Bayesian estimation of a layer average severity 


a (a) F (xa) w(ax) f (xla) a(alx)  LAS(a) TxL*  z(ajx) (a) 
1.0 0.0400 1.52x10725 6.10x10-27 0.0000 160,944 0 6,433 
1.1 0.0496 6.93x107?4 3.44x10725 0.0000 118.085 2 195,201 
1.2 0.0592 1.37x107?2  8.13x10-24 0.0003 86,826 29 2 496,935 
1.3 0.0688 1.36x10-24 9.33x10-23 0.0038 63'979 243 15.558 906 
1.4 0.0784 7.40x107?! 5.80x10-22 0.0236 47.245 1 116 52,737,840 
1.5 0.0880 2.42x107-29 2.13x107?1 0.0867 34.961 3,033 106,021,739 
1.6 0.0832 5.07x10-20 422x1072! 01718 25926 4454 115480050 
1.7 0.0784 7.18x1072° 5.63x1072! 0.2293 19.265 4418 85,110,453 
1.8 0.0736 7.19x10729 5.29x107?1! 0.2156 14,344 3.093 44.366.353 
Le 0.0688  5.29x10729 3.64x10-22 0.1482 10,702 1,586 16,972,802 
ra 0.0640  2.95x10-79 1.89x10-2! 0.0768 8,000 ‘614 4,915,383 
2.1 0.0592 1.28x10729 7.57x10-22 0.0308 5,992 185 1,106,259 
2.2 0.0544 4.42x10774 240x10722 0.0098 4,496 Ad "197,840 
2.3 0.0496 1.24x10~?! 6.16x10-23 0.0025 3,380 8 28,650 
24 0.0448 2.89x10772 1.29x10-23 0.0005 2.545 1 “3.41 
2.5 0.0400 5.65x10728 2.96x107-24 0.0001 1,920 0 a 

1.0000 2.46x10-29 1.0000 18,827 445,198,597 
*z(a|x)LAS(a) 


LAS(a) = E(X A 5,000,000) — E(X A 1,000,000) 


= 100,000" ( 1 i 
a—1  \1,000,000~* smo a0 , afl, 
= 100,000 (In 5,000,000 — In 1,000,000) , Pree 


Values of LAS(a) for the 16 possible values of a appear in column 6. The 
last two columns are then used to obtain the posterior expected values of the 
layer average severity. The point estimate is the posterior mean, 18,827. The 
posterior standard deviation is k 


vV 445,198,597 — 18,8272 = 9, 526. 


We can also use columns 5 and 6 to construct a credibility interval. Discard- 
ing the first five rows and the last four rows eliminates 0.0406 of posterior 
probability. That leaves (5,992, 34,961) as a 96% credibility interval for the 
layer average severity. Part of Meyers’ paper was the observation that even 
with a fairly large sample the accuracy of the estimate is poor. 

‘The discrete approximation to the prior distribution could be refined by 
using many more than 16 values. This adds little to the spreadsheet effort 
The number was kept small here only for display purposes. o 
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12.4.5 Exercises 


12.67 Show that, if Y is the predictive distribution in Example 12.28, then 
ln Y — ln 100 has the Pareto distribution. 


12.68 Determine the posterior distribution of œ in Example 12.28 if the prior 
distribution is an arbitrary gamma distribution. To avoid confusion, denote 
the first parameter of this gamma distribution by y. Next determine a partic- 
ular combination of gamma, parameters so that the posterior mean is the max- 
imum likelihood estimate of a regardless of the specific values of 21,...,%n- 


Is this prior improper? 


12.69 For Example 12.51 demonstrate that the maximum likelihood estimate 
of a is 1.75. 
12.70 Let £1, ..-, Tn be arandom sample from a lognormal distribution with 


unknown parameters p and ø. Let the prior density be q(u, o) = 07t. 


(a) Write the posterior pdf of u and o up to a constant of proportion- 
ality. 

(b) Determine Bayesian estimators of u and o by using the posterior 
mode. 

(c) Fix o at the posterior mode as determined in part (b) and then de- 
termine the exact (conditional) pdf of p. Then use it to determine 
a 95% HPD credibility interval for u. 


12.71 A random sample of size 100 has been taken from a gamma distribution 
with a known to be 2, but @ unknown. For this sample, ee zj = 30,000. 
The prior distribution for 8 is inverse gamma with £ taking the role of a and 
À taking the role of 0. 


(a) Determine the exact posterior distribution of 6. At this point the 
values of 8 and À have yet to be specified. 


(b) The population mean is 28. Determine the posterior mean of 26 
using the prior distribution first with B = à =0 [this is equivalent 
to x(0) = 07}] and then with 6 = 2 and A = 250 (which is a prior 
mean of 250). Then, in each case, determine a 95% credibility 
interval with 2.5% probability on each side. 

(c) Determine the posterior variance of 28 and use the Bayesian central 
limit theorem to construct a 95% credibility interval for 20 using 
each of the two prior distributions given in part (b). 


(d) Determine the maximum likelihood estimate of 6 and then use the 
estimated variance to construct a 95% confidence interval for 20. 


EA 
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12.72 Suppose that given © = 0 the random variables X1,..., Xn are inde- 
pendent and binomially distributed with pf 


Kj\ ac. a 
fx;o(2;8) = Coe i(1 — 0) Ti, Tj = 0, Tiii Kis 
3 


and O itself is beta distributed with parameters a and b and pdf 


T(a +b) 
P(a)r (b) 


(a) Verify that the marginal pf of X; is 


z(0) = oti —a)yr!, o<@ <1. 


and E(X;) = aK;/(a+b). This distribution is termed the binomial- 
beta or negative hypergeometric distribution. 


(b) Determine the posterior pdf x@)x(@|x) and the posterior mean 
E(O|x).. 


12.73 Suppose that given © = @ the random variables X4,..., Xn are inde- 
pendent and identically exponentially distributed with pdf l 


fx;o(z;l8) = beT, zj > 0, 
and © is itself gamma distributed with parameters a > 1 and £ > 0, 
led ad 
P(a)B* ? 
(a) Verify that the marginal pdf of X; is 


fx; (aj) =a8-*(8"+ +2;)"*, 2; >0, 


(0) = 6>0. 


and i 
PO) = ea 


This distribution is one form of the Pareto distribution. 


(b) Determine the posterior pdf ie@)x(6[x) and the posterior mean 
E(O|x). 


12.74 Suppose that given © = @ the random variables X1,...,X, are in- 


dependent and identically negative binomially distributed with parameters r 
and @ with pf 


r+2;—1\,, Sips 
Fx;)0(2;|@) = ( Jo (1—8) a, Tj = 0L 2a 
fe 


| 
a 


BAYESIAN ESTIMATION 379 


| 
and © itself is beta distributed with parameters a and b and pdf l 


T(a +b) | 


m(0) = Tare? l Mge a=", 0< 0 <1. | 


(a) Verify that the marginal pf of X; is 


I(r +2;) T(a+b) P(a+r)l(b+2;) 


D(r)xj! D(a)P(b) T(a +r +b+ zz)’ 


fx,(xj) = zj =0,1,2,..-, 


and rb 


a—1 


E(X;) = 


This distribution is termed the generalized Waring distribu- 
tion. The special case where b = 1 is the Waring distribution 
and the Yule distribution if r = 1 and b= 1. 


(b) Determine the posterior pdf fo;x(6|x) and the posterior mean 
E(|x). 


12.75 Suppose that given © = @ the random variables X1,...,Xp are inde- 


pendent and identically normally distributed with mean u and variance gt 
and © is gamma distributed with parameters a and (0 replaced by) 1/2. 


(a) Verify that the marginal pdf of X; is 


rat) [,, 2 BN te mentee 
- 4) | OR , 00 < Tj < 0O, 


which is a form of the t-distribution. 


Í Xj (zj ) 
(b) Determine the posterior pdf fe;x(6|x) and the posterior mean 
E(6|x). 
12.76 Prove that the binomial distribution with pf 
n = 
an= (ra -a 
is of the form (12.10) and identify 0, p(x), and q(@). 
12.77 Consider the negative binomial distribution with pf 
T'(a+2) B ve 1 y 


If a is fixed, show that f(z; a, £) is of the form (12.10) and identify 0, p(x), 
and q(@). 
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12.78 Suppose Xj,...,X, are independent and identically distributed with 
distribution (12.10). Prove that the maximum likelihood estimate of the mean 


is the sample mean. In other words, if Ô is the maximum likelihood estimator 
of 0, prove that 


u(0) = pÔ) = X. 
12.79 Consider the generalization of (12.10) given by 


_ p(m,z)e—™? 
o KAP 


where m is a known parameter. Prove that the mean is still given by (12.12) 
but the variance is given by v(0)/m, where v(0) is given by (12.13). 


f(z; 8) 


12.80 Let X,,...,X, be independent and identically distributed conditional 
on O with pf 


_ p(z;)e7** 
fx;o(z;18) = ae 


Let S = Xi +- +H Xp. 
(a) Show that, conditional on ©, S has pf of the form 
Dn(s)e~% 
fsjo(sl@) = =~, 
106) = “TOF 
where p(s) does not depend on @. 


(b) Prove that the posterior distribution 7@;x (8|x) is the same as the 
(conditional) distribution of ©|S, 


fsjo(s|@)x(8) 
fs(s) 


where 7(@) is the pf of © and fs(s) is the marginal pf of S. 


Tex (O|x) = 


12.81 Suppose that given N the random variable X is binomially distributed 
with parameters N and p. 


(a) Show that, if N is Poisson distributed, so is X (unconditionally) 
and identify the parameters. 


(b) Show that, if N is binomially distributed, so is X (unconditionally) 
and identify the parameters. 


(c) Show that, if N is negative binomially distributed, so is X (uncon- 
ditionally) and identify the parameters. 


12.82 (*) A die is selected at random from an urn that contains two six-sided 
dice. Die number 1 has three faces with the number 2 while one face each has 


caster AR 
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the numbers 1, 3, and 4. Die number 2 has three faces with the number 4 
while one face each has the numbers 1, 2, and 3. The first five rolls of the die 
yielded the numbers 2, 3, 4, 1, and 4 in that order. Determine the probability 
that the selected die was die number 2. 


12.83 (*) The number of claims in a year, Y, has a distribution which depends 
on a parameter @. As a random variable, © has the uniform distribution on 
the interval (0,1). The unconditional probability that Y is 0 is greater than 
0.35. For each conditional pf given below, determine if it is possible that it is 
the true conditional pf of Y. 


(a) Pr(Y = y|@) = e964 /y!. 
(b) Pr(Y = yļ0) = (y + 1)07(1 — 0)”. 
(c) Pr(Y = y|6) = GPa — 0). 


12.84 (*) Your prior distribution concerning the unknown value of H is 
Pr(H = 4) = $ and Pr(H = $) = į. The observation from a single ex- 
periment has distribution Pr(D = d| H = h) = h4(1—h)!~¢ for d= 0,1. The 
result of a single experiment is d = 1. Determine the posterior distribution of 
H. 


12.85 (*) The number of claims in one year, Y, has the Poisson distribution 
with parameter 9. The parameter @ has the exponential distribution with pdf 
a(@) = e~®. A particular insured had no claims in‘one year. Determine the 
posterior distribution of 6 for this insured. 


12.86 (*) The number of claims in one year, Y, has the Poisson distribution 
with parameter 9. The prior distribution has the gamma distribution with 
pdf 7(@) = 6e~®. There was one claim in one year. Determine the posterior 
pdf of 8. 


12.87 (*) Each individual car’s claim count has a Poisson distribution with 
parameter À. All individual cars have the same parameter. The prior dis- 
tribution is gamma with parameters a = 50 and @ = 1/500. In a two-year 
period, the insurer covers 750 and 1,100 cars in years 1 and 2, respectively. 
There were 65 and 112 claims in years one and two, respectively. Determine 
the coefficient of variation of the posterior gamma distribution. 


12.88 (*) The number of claims, r, made by an individual in one year has the 
binomial distribution with pf f(r) = (È) 87 (1 — 0)3—". The prior distribution 
for 0 has pdf 7(0) = 6(0 — 0°). There was one claim in a one-year period. 
Determine the posterior pdf of 8. 


12.89 (*) The number of claims for an individual in one year has a Poisson 
distribution with parameter A. The prior distribution for À has the gamma 
distribution with mean 0.14 and variance 0.0004. During the past two years a 
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total of 110 claims has been observed. In each year there were 310 policies in 
force. Determine the expected value and variance of the posterior distribution 
of A. : 


12.90 (*) The number of claims for an individual in one year has a Poisson 
distribution with parameter À. The prior distribution for À is exponential with 
an expected value of 2. There were three claims in the first year. Determine 
the posterior distribution of À. 


12.91 (*) The number of claims in one year has the binomial distribution 
with n = 3 and 6 unknown. The prior distribution for 0 is beta with pdf 
a(0) = 28003(1 — @)*, 0 <@< 1. Two claims were observed. Determine 
each of the following. 


(a) The posterior distribution of 6. 
(b) The expected value of @ from the posterior distribution. 


12.92 (*) An individual risk has exactly one claim each year. The amount of 
the single claim has an exponential distribution with pdf f(x) = te~*, x > 0. 
The parameter t has a prior distribution with pdf q(t) = te~*. A claim of 5 
has been observed. Determine the posterior pdf of t. 


12.93 Suppose that given 01 = 6; and O2 = #2 the random variables Xj,..., 
Xn are independent and identically normally distributed with mean 6; and 
variance 03 1 Suppose also that the conditional distribution of ©, given 
©. = bə is a normal distribution with mean p and variance o7/2 and Oz is 
gamma distributed with parameters a and 6 = 1/f. 


(a) Show that the posterior conditional distribution of ©; given Oo = 
ĝa is normally distributed with mean 


and variance 
2 o 


= = 65(1 + no?) 
and the posterior marginal distribution of ©% is gamma distributed 
with parameters 


Qa =a+— 
eae. 


and 


n gie 
8. = 8+) -aa E 


t=1 


(b) Find the posterior marginal means E(©;|x) and E(Q2|x). 
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Table 12.7 Number of hospital liability claims by year 


Year Number of claims 
1985 6 
1986 2 
1987 3 
1988 0 
1989 2 
1990 1 
1991 2 
1992 5 
1993 1 
1994 3 


Table 12.8 Hospital liability claims by frequency 


Frequency (k) Number of observations (ng 


oak WN eH © 
OFF ON WN Fr |wH 


“I 
-+ 


12.5 ESTIMATION FOR DISCRETE DISTRIBUTIONS 


12.5.1 Poisson 


The principles of estimation discussed earlier in this chapter for continuous 
models can be applied equally to frequency distributions. We will now illus- 
trate the methods of estimation by fitting a Poisson model. 


Example 12.52 A hospital liability policy has experienced the number of 
claims over a 10-year period given in Table 12.7. Estimate the Poisson para- 
meter using the method of moments and the method of maximum likelihood. 


These data can be summarized in a different way. We can count the number 
of years in which exactly zero claims occurred, one claim occurred, and so on, 
as in Table 12.8. 

The total number of claims for the period 1985-1994 is 25. Hence, the 
average number of claims per year is 2.5. The average can also be computed 
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from Table 12.8. Let ną denote the number of years in which a frequency of 
exactly k claims occurred. The expected frequency (sample mean) is 


where ng represents the number of observed values at frequency k. Hence the 
method-of-moments estimate of the Poisson parameter is \ = 2.5. 

Maximum likelihood estimation can easily be carried out on these data. 
The likelihood contribution of an observation of k is pp. Then the likelihood 
for the entire set of observations is 


and the loglikelihood is 
l= 5 Np ln Pk. 
k=0 


The likelihood and loglikelihood functions are considered to be functions of 
the unknown parameters. In the case of the Poisson distribution, there is only 
one parameter, making the maximization easy. 
For the Poisson distribution, 
_ eo AF 

a 

and 
Inp, = —-A+ kind — Inkl. 


The loglikelihood is 


l = S \m(-A+klnd— Ink!) 
k=0 


= ae ny nv — Sant 
k=0 k=0 


where n = J po nk is the sample size. Differentiating the loglikelihood with 
respect to A, we obtain 


dl = 1 


By setting the derivative of the loglikelihood to zero, the maximum likelihood 
estimate is obtained as the solution of the resulting equation. The estimator 
is then RA 
j= ob y, 
n 


| 
| 
| 
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From this it can be seen that for the Poisson distribution the maximum like- 
lihood and the method-of-moments estimators are identical. 
If N has a Poisson distribution with mean A, then 


E(A) = E(N) =. 


and 
Var(N) à 


n n` 


Var(\) = 


Hence, Â is unbiased and consistent. From Theorem 12.13, the maximum 
likelihood estimator is asymptotically normally distributed with mean à and 


variance 
d? en 
[e Las inp] } 


= {ot | A (-A+NIna~Invy] } 
[ 


Var(A) 


Il 


dd? 
nE(N/)*)| 
à 
= 


(m) = 3 


—1 


Il 


In this case the asymptotic approximation to the variance is equal to its 
true value. From this information, we can construct an approximate 95% 
confidence interval for the true value of the parameter. The interval is A+ 
1.96(A/n)*/?. For this example, the interval becomes (1.52, 3.48). This con- 
fidence interval is only an approximation because it relies on large sample 
theory. The sample size is very small and such a confidence interval should 
be used with caution. (m 


The formulas presented so far have assumed that the counts at each ob- 
served frequency are known. Occasionally, data are collected so that this is 
not given. The most common example is to have a final entry given as k+, 
where the count is the number of times k or more claims were observed. If nj. 
is the number of times this was observed, the contribution to the likelihood 
function is 


(Pi + Pia +r) = (L— po = +++ — pri) t. 


The same adjustments apply to grouped frequency data of any kind. Sup- 
pose there were five observations at frequencies 3-5. The contribution to the 
likelihood function is 


(p3 + pa + ps)°. 


386 PARAMETER ESTIMATION 


Table 12.9 Data for Example 12.53 


No. of claims/day Observed no. of days 


aoa PWM © 
a 
N 


Example 12.53 For the data in Table 12.91} determine the mazimum like- 
lihood estimate for the Poisson distribution. 


The likelihood function is 


L = på pi pps Pa Ps (1 — po — Pi — P2 — p3 — Pa — Ds)”, 
and when written as a function of A, it becomes somewhat complicated. While 
the derivative can be taken, solving the equation when it is set equal to zero 
will require numerical methods. It may be just as easy to use a numerical 
method to directly maximize the function. A reasonable starting value can be 
obtained by assuming that all nine observations were exactly at 6 and then 
using the sample mean. Of course, this will understate the true maximum 
likelihood estimate, but should be a good place to start. For this particular 
example, the maximum likelihood estimate is À = 2.0226, which is very close 
to the value obtained when all the counts were recorded. {J 


12.5.2 Negative binomial 


The moment equations are 


co a 
rp = Zao hn kn: z (12.14) 


and 


rp +8) = Liot mw i (Hetis =) = 3 (12.15) 


n 


with solutions Ê = (s?/#)—1 and # = 3/8. Note that this variance estimate is 
obtained by dividing by n, not n— 1. This is a common, though not required, 


ll This is the same data as will be analyzed in Example 13.15 except the observations at 6 
or more have been combined. 


es ne ie re rit mere tee rT 
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approach when using the method of moments. Also note that, if s? < @, the 
estimate of @ will be negative, an inadmissible value. 


Example 12.54 (Example 12.52 continued) Estimate the negative binomial 
parameters by the method of moments. 


` The sample mean and the sample variance are 2.5 and 3.05 (verify this), 


respectively, and the estimates of the parameters are 7 = 11.364 and B = 
0.22. o 


When compared to the Poisson distribution with the same mean, it can be 
seen that @ is a measure of “extra-Poisson” variation. A value of 6 = 0 means 
no extra-Poisson variation, while a value of 8 = 0.22 implies a 22% increase in 
the variance when compared to the Poisson distribution with the same mean. 

We now examine maximum likelihood estimation. The loglikelihood for 
the negative binomial distribution is 


Sink In pk 


k=0 


ym [i 24 *) -rm0 + A) + king = kia(i + 6) 


~~ 
Il 


Il 


The loglikelihood is a function of the two parameters 6 and r. In order to find 
the maximum of the loglikelihood, we differentiate with respect to each of the 
parameters, set the derivatives equal to zero, and solve for the parameters. 
The derivatives of the loglikelihood are 


-Ym (3 -1 5) (12.16) 
and 


ps co k— 
= = -Yna +p) +Y n Gy Ete et 


k=0 k=0 


= -na +6) +X meg oT eam 


m=0 


= -naa +8) + Domes, en) 
m=0 
k-1 
= ICS we = (12.17) 


T 
k=1 m=0 +m 
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Setting these equations to zero yields 


= (12.18) 


k—1 
nln(1 + Ê) = ym (£ ta): (12.19) 


Note that the maximum likelihood estimator of the mean is the sample mean 
(as, by definition, in the method of moments). Equations (12.18) and (12.19) 
can be solved numerically. Replacing @ in (12.19) by A/F yields the equation 


H(#) =nin (1+ 2) - yom (£; E; (12.20) 


Tf the right-hand side of (12.15) is greater than the right-hand side of (12.14), 
it can be shown that there is a unique solution of (12.20). If not, then the 
negative binomial model is probably not a good model to use because the 
sample variance does not exceed the sample mean.1? 

Equation (12.20) can be solved numerically for f using the Newton—Raphson 
-method. The required equation for the kth iteration is 


A(rg-1) 


eS BG 


A useful starting value for ro is the moment-based estimator of r. Of course, 
any numerical root-finding method (e.g., bisection, secant) may be used. 

The loglikelihood is a function of two variables. It can be maximized di- 
rectly using methods like those described in Appendix F. For the case of 
the negative binomial distribution with complete data, because we know the 
estimator of the mean must be the sample mean, setting 6 = Z/r reduces this 
to a one-dimensional problem. 


Example 12.55 Determine the maximum likelihood estimates of the negative 
binomial parameters for the data in Example 12.52. 


The maximum occurs at * = 10.9650 and B = 0.227998. O 


Example 12.56 Trébliger [130] studied the driving habits of 23,589 auto- 
mobile drivers in a class of automobile insurance by counting the number of 


12% other words, when the.sample variance is less than or equal to the mean, the loglike- 
lihood function will not have a maximum. The function will keep increasing as r goes to 
infinity and 8 goes to zero with the product remaining constant. This effectively says that 
the negative binomial distribution that best matches the data is the Poisson distribution 
that is a limiting case. 
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Table 12.10 Two models for automobile claims frequency 


No. of No. of Poisson Negative binomial 
claims/year drivers expected expected 
0 20,592 20,420.9 20,596.8 
1 2,651 2,945.1 2,631.0 
2 297 212.4 318.4 
3 41 10.2 37.8 
4 T 0.4 4.4 
5 0 0.0 0.5 
6 1 0.0 0.1 
T+ 0 0.0 0.0 
Parameters A = 0.144220 r = 1.11790 

8 = 0.129010 
Loglikelihood —10,297.84 —10,223.42 


accidents per driver in a one-year time period. The data as well as fitted 
Poisson and negative binomial distributions are given in Table 12.10. Based 
on the information presented, which distribution appears to provide a better 
model? 


The expected counts are found by multiplying the sample size (23,589) by 
the probability assigned by the model. It is clear that the negative binomial 
probabilities produce expected counts that are much closer to those that were 
observed. In addition, the loglikelihood function is maximized at a signifi- 
cantly higher value. Formal procedures for model selection (including what 
it means to be significantly higher) are discussed in Chapter 13. However, in 
this case, the superiority of the negative binomial model is apparent. d 


12.5.3 Binomial 


The binomial distribution has two parameters, m and q. Frequently, the 
value of m is known and fixed. In this case, only one parameter, q, needs to 
be estimated. In many insurance situations, q is interpreted as the probability 
of some event such as death or disability. In such cases the value of q is usually 


estimated as 
number of observed events 


1= Tnaximum number of possible events’ 
which is the method-of-moments estimator when m is known. 

In situations where frequency data are in the form of the previous examples 
in this chapter, the value of the parameter m, the largest possible observation, 
may be known and fixed or unknown. In any case, m must be no smaller than 
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the largest observation. The loglikelihood is 


m 
5 nk ln pk 
k=0 


ym in (7) +klng+ (m -— k)ln(1 -0) 


~ 


Il 


When m is known and fixed, one needs only maximize | with respect to q. 


a is LS 
=— => knk — — m — kng. 
ay apg Ta Dg Mmm 


Setting this equal to zero yields 


i Jozo knk 


ĝ= 
m J ponk 


which is the sample proportion of observed events. For the method of mo- 
ments, with m fixed, the estimator of q is the same as the maximum likelihood 
estimator because the moment equation is 


mq = Do knk 
Lro Nk 
When m is unknown, the maximum likelihood estimator of q is 
A 1 ee knk 99 
1= >So (12.21) 
™ demo nk 


where ñ is the maximum likelihood estimate of m. An easy way to approach 
the maximum likelihood estimation of m and q is to create a likelihood profile 
for various possible values of m as follows: 


Step 1: Start with 7 equal to the largest observation. 
Step 2: Obtain ĝ using (12.21). 

Step 3: Calculate the loglikelihood at these values. 
Step 4: Increase mh by 1. 

Step 5: Repeat steps 2-4 until a maximum is found. 


As with the negative binomial, there need not be a pair of parameters that 
maximizes the likelihood function. In particular, if the sample mean is less 
than or equal to the sample variance, the procedure above will lead to ever 
increasing loglikelihood values as the value of 7m is increased. Once again, the 
trend is toward a Poisson model. This can be checked out using the data from 
Example 12.52. 
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Table 12.11 Number of claims per policy 


No. of claims/policy No. of policies 


5,367 
5,893 
2,870 
842 
163 
23 

1 

1 

+ 0 


ON oar WNH © 


Example 12.57 The number of claims per policy during a one-year period 
for a block of 15,160 insurance policies are given in Table 12.11. Obtain 
moment-based and maximum likelihood estimators. 


The sample mean and variance are 0.985422 and 0.890355, respectively. 
The variance is smaller than the mean, suggesting the binomial as a reasonable 
distribution to try. The method of moments leads to 


mg = 0.985422 


mq(1 — q) = 0.890355. 


Hence, g = 0.096474 and mm = 10.21440. However, m can only take on integer 
values. We choose ™ = 10 by rounding. Then we adjust the estimate of @ to 
0.0985422 from the first moment equation. Doing this will result in a model 
variance which differs from the sample variance because 10(0.0985422)(1 — 
0.0985422) = 0.888316. This shows one of the pitfalls of using the method of 
moments with integer-valued parameters. 

We now turn to maximum likelihood estimation. From the data m > 7. 
If m is known, then only q needs to be estimated. If m is unknown, then we 
can produce a likelihood profile by maximizing the likelihood for fixed values 
of m starting at 7 and increasing until a maximum is found. The results are 
in Table 12.12. 

The largest loglikelihood value occurs at m = 10. If, a priori, the value 
of m is unknown, then the maximum likelihood estimates of the parameters 
are m = 10 and ĝ = 0.0985422. This is the same as the adjusted moment 
estimates. This is not necessarily the case for all data sets. O 
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Table 12.12 Binomial likelihood profile 


m ĝ —Loglikelihood 
7 0.140775 19,273.56 
8 0.123178 19,265.37 
9 0.109491 19,262.02 
10 0.098542 19,260.98 
11 0.089584 19,261.11 

19,261.84 


12 0.082119 


12.5.4 The (a, b, 1) class 


Estimation of the parameters for the (a,b, 1) class follows the same general 
principles that were used in connection with the (a, b, 0) class. 

Assuming that the data are in the same form as the previous examples, the 
likelihood is, using (4.13), 


L= (pt) [0 = 


k=1 


)™ TIE- pik]. 
k=1 i 
-The loglikelihood is, 


oo 
ng np! + S nelin 1 


om 


— pa’) +Inpg] 


co co 
= nolnp +X ng in(1 — 79") + $O nafla pr — In(1 — po)], 
k=1 k=1 


where the last statement follows from p? = p;,/(1—po). The three parameters 
of the (a,b, 1) class are pi’, a, and b, where a and b determine p,po,.... 
Then it can be seen that 


l=ip +h 
with 
oo 
lo = nolnpy +) > ngin(1 — 79"), 
k=1 
oo 
la = J mlinp — (1 - po)], 
k=1 


where lọ depends only on the parameter pẹ! and lı is independent of på“, 
depending only on a and b. This simplifies the maximization because 
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resulting in 


aM _ no 
ar 


the proportion of observations at zero. This is the natural estimator because 
pe! represents the probability of an observation of zero. 

Similarly, because the likelihood factors conveniently, the estimation of a 
and b is independent of pẹf. Note that although a and b are parameters 
maximization should not be done with respect to them. That is because not 
all values of a and b produce admissible probability distributions.'? For the 
zero-modified Poisson distribution, the relevant part of the loglikelihood is 


Sm [me 


= —(n—no)A+ (È m) Ind — (n — no) n(1 — ee) +c 


k=1 
= —(n—no)[A\+ln(1 —e~)] +n@lnA+¢, 


ly 


a 


where Z = +772) kny is the sample mean, n = Ð g—o nx, and c is indepen- 
dent of A. Hence, 


Oly e 


Tī 
a (n — 19) — (n no)7— sax ty 
_ _nm-n nz : 
oo lse AS 
Setting this to zero yields 
a(l—e>) = 2 — A. (12.22) 


By graphing each side as a function of A, it is clear that, if no > 0, there exist 
exactly two roots: one is \ = 0, the other is \ > 0. Equation (12.22) can be 


solved numerically to obtain Â. Note that, because pẹ! = ng/n and po = e~, 
(12.22) can be rewritten as 
__ 1-2)" 
== AÀ 12.23 
1 — po ( ) 


Because the right-hand side of (12.23) is the theoretical mean of the zero- 
modified Poisson distribution (when p is replaced with pẹ”), (12.23) is a 


13Maximization can be done with respect to any parameterization because maximum likeli- 
hood estimation is invariant under parameter transformations. However, it is more difficult 
to maximize over bounded regions because numerical methods are difficult to constrain and 
analytic methods will fail due to a lack of differentiability. Therefore, estimation is usually 
done with respect to particular class members, such as the Poisson. 
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moment equation. Hence, an alternative estimation method yielding the same 
results as the maximum likelihood method is to equate pi’ to the sample 
proportion at zero and the theoretical mean to the sample mean. This suggests 
that, by fixing the zero probability to the observed proportion at zero and 
equating the low order moments, a modified moment method can be used 
to get starting values for numerical maximization of the likelihood function. 
Because the maximum likelihood method has better asymptotic properties, 
it is preferable to use the modified moment method only to obtain starting 
values. 
For the purpose of obtaining estimates of the asymptotic variance of the 
. maximum likelihood estimator of À, it is easy to obtain 


alee — no) e7à _ nī 
One ET OF 


and the expected value is obtained by observing that E(Z) = (1 — p)A/ (1— 
e~). Finally, p% may be replaced by its estimator, no/n. The variance of Bo" 
is obtained by observing that the numerator, ng, has a binomial distribution 
and therefore the variance is pV (1 — pi) /n. 

For the zero-modified binomial distribution, 


l = yom {in (2) (1 — an -bfi -(1- ani} 
Z (3: bn Ing+ ee — k)n;, In(1 — q) 
k=1 k=1 


S ii -(1—4)"] +c 
k=1 
= nilng+m(n— no) n(1 — g) — nz In(1 — g) 
—(n— no) In[l — (1—q)™] +c 


where c does not depend on g and 


di në m(n—n0) 4 Pt _ (n=no)m(1— 4g)" 


oq q 1—q 1—q L(y 
Setting this to zero yields 
m 1 — pit 4 
i= mq, (12.24) 
1—po 


where we recall that pp = (1 — q)™. This equation matches the theoretical 
mean with the sample mean. AINI 
If m is known and fixed, the maximum likelihood estimator of po. is still 


aM _ no 
Po n 
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However, even with m known, (12.24) must be solved numerically for g. When 
m is unknown and also needs to be estimated, the above procedure can be 
followed for different values of m until the maximum of the likelihood function 
is obtained. 

The zero-modified negative binomial (or extended truncated negative bi- 
nomial) distribution is a bit more complicated because three parameters need 


` to be estimated. Of course, the maximum likelihood estimator of pẹ is 


PF = no/n as before, reducing the problem to the estimation of r and £. 
The part of the loglikelihood relevant to r and Bis 


l = s ng In pr — (n — no) In(1 — po). (12.25) 
k=l 
Hence 
i = Yemen EY) hs) (a) ] 
mee hı z (z) ] À (12.26) 


This function needs to be maximized over the (r, 8) plane to obtain the max- 
imum likelihood estimates. This can be done numerically using maximization 
procedures such as those described in Appendix F; Starting values can be ob- 
tained by the modified moment method by setting 9‘ = no/n and equating 
the first two moments of the distribution to the first two sample moments. 
It is generally easier to use raw moments (moments about the origin) than 
central moments for this purpose. In practice, it may be more convenient to 


. maximize (12.25) rather than (12.26) because one can take advantage of the 


recursive scheme 


b 
Pk = Pk~1 (a T z) 


in evaluating (12.25). This makes computer programming a bit easier. 
For zero-truncated distributions there is no need to estimate the proba- 
bility at zero because it is known to be zero. The remaining parameters are 


estimated using the same formulas developed for the zero-modified distribu- 
tions. 


Example 12.58 The data set in Table 12.13 comes from Beard et al. [12]. 
Determine a model that adequately describes the data. 


When a Poisson distribution is fitted to it, the resulting fit is very poor. 
There is too much probability for one accident and two little at subsequent 


values. The geometric distribution is tried as a one-parameter alternative. It 
has loglikelihood 
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Table 12.13 Fitted distributions to Beard data 


Accidents Observed Poisson Geometric ZM Poisson ZM geom. 
0 370,412 369,246.9 372,206.5 370,412.0 370,412.0 
1 46,545 48 643.6 43,325.8 46,432.1 46,555.2 
2 3,935 3,204.1 5,043.2 4,138.6 3,913.6 
3 317 140.7 587.0 245.9 329.0 
4 28 4.6 68.3 11.0 27.7 
5 3 0.1 8.0 0.4 2.3 
6 0 0.0 1.0 0.0 0.2 
Parameters à: 0.13174 p: 0.13174 pf: 0.87934 pð“: 0.87934 

A: 0.17827 £: 0.091780 
Loglikelihood —171,373 —171,479 —171,160 —171,133 


B k 
l = -nl(1+ 8) j+ > mm (TF) 


= —nln(1+) +S knglln 8 —In(1 +8) 
k=1 

= —nin(1+f)+nz[lnf —In(1+ A) 

—(n+nz)In(1+ 6) +nzln8, 


where = Jp; k n/n and n= J pao nk- 
Differentiation reveals that the loglikelihood has a maximum at 


B=. 
A qualitative look at the numbers indicates that the zero-modified geometric 


distribution matches the data better than the other three models considered. 
A formal analysis is done in Example 13.16. Oo 


12.5.5 Compound models 


For the method of moments, the first few moments can be matched with the 
sample moments. The system of equations can be solved to obtain the moment 
based estimators. Note that the number of parameters in the compound 
model is the sum of the number of parameters in the primary and secondary 
distributions. The first two theoretical moments for compound distributions 
are 


E(S) E(N)E(M) 
Var(S) = E(N) Var(M)+E(M)? Var(N). 
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These results were developed in Chapter 6. The first three moments for the 
compound Poisson distribution are given in (4.27). 
Maximum likelihood estimation is also carried out as before. The loglike- 
lihood to be maximized is os 
l= > ny In gp. 
k=0 


When gx is the probability of a compound distribution, the loglikelihood can 
be maximized numerically. The first and second derivatives of the loglikeli- 
hood can be obtained by using approximate differentiation methods as applied 
directly to the loglikelihood function at the maximum value. 


Example 12.59 Determine various properties of the Poisson~zero-truncated 
geometric distribution. This distribution is also called the Polya-Aeppli dis- 
tribution. 


For the zero-truncated geometric distribution the pgf is 


2S se=a e+e 
ie 1=(1+4) 


and therefore the pgf of the Polya-Aeppli distribution is 


Pi [Pa(2)] = exp 0 ae =) 


ape e o 


P'(1)=A(1 +8) 


P(z) 


The mean is 


and the variance is 
P"(1) + P’(1) — [P’()P? = A0 + B)(1 + 28). 


Alternatively, E(N) = Var(N) = A, E(M) = 1 + 2, and Var(M) = (1 + 8). 
Then, 


E(S) 
Var(S) 


\(1 + 8), 
ABC + B) + A(L + 8)? = ACL + B)(1 + 28). 


From Theorem 4.51, the probability at zero is 
go = P,(0) =e™. 


The successive values of gy are computed easily using the compound Poisson 
recursion 


k 
SrA i fj QJ k =1,2,3,..., (12.27) 


aly 
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Table 12.14 Automobile claims by year 


Year Exposure Claims 
1986 2,145 207 
1987 2,452 227 
1988 3,112 341 
1989 3,458 335 
1990 3,698 362 
1991 3,872 359 


where f; = 6’~*/(1+ p), j = 1,2,.... For any values of À and £, the 
loglikelihood function can be easily evaluated. a 


Example 13.17 provides a data set for which the Polya—Aeppli distribution 
is a good choice. 

Another useful compound Poisson distribution is the Poisson-extended 
truncated negative binomial (Poisson-ETNB) distribution. Although it does 
not matter if the secondary distribution is modified or truncated, we prefer the 
truncated version here so that the parameter r may be extended.!* Special 

` cases are: r = 1, which is the Poisson-geometric (also called Polya~Aeppli); 
r —+ 0, which is the Poisson-logarithmic (negative binomial); and r = —0.5, 
which is called the Poisson—-inverse Gaussian. This name is not consistent 
with the others. Here the inverse Gaussian distribution is a mixing distrib- 
ution (see Section 4.6.9). Example 13.18 provides a data set for which the 
Poisson—inverse Gaussian distribution is a good choice. 


12.5.6 Effect of exposure on maximum likelihood estimation 


In Section 4.6.11 the effect of exposure on discrete distributions was discussed. 
When aggregate data from a large group of insureds is obtained, maximum 
likelihood estimation is still possible. The following example illustrates this 
for the Poisson distribution. 


Example 12.60 Determine the mazimum likelihood estimate of the Poisson 
parameter for the data in Table 12.14. 


Let À be the Poisson parameter for a single exposure. If year k has ep ex- 
posures, then the number of claims has a Poisson distribution with parameter 


14This does not contradict Theorem 4.54. When —1 < r < 0, it is still the case that 
changing the probability at zero will not produce new distributions. What is true is that 
there is no probability at zero which will lead to an ordinary (a,b,0) negative binomial 
secondary distribution. 
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de;,. Lf ny is the number of claims in year k, the likelihood function is 
6 —Aex (A Tek 
e ep) 
TS | [ 
lI ny! 


The maximum likelihood estimate is found by 


6 
l=hL= N [er + nj, In(Aex) — ln(ng!)], 


k=1 
ð £ zi 
By = Dy (ee + KA ) =0, 
k=1 


d 

In this example the answer is what we expected it to be, the average number 

of claims per exposure. This technique will work for any distribution in the 

(a,b, 0)! and compound classes. But care must be taken in the interpretation 

of the model. For example, if we use a negative binomial distribution, we are 

assuming that each exposure unit produces claims according to a negative 

binomial distribution. This is different from assuming that total claims have 

a negative binomial distribution because they arise from individuals who each 
have a Poisson distribution but with different parameters. 


12.5.7 Exercises 


12.94 Assume that the binomial parameter m is known. Consider the maxi- 
mum likelihood estimator of q. 

(a) Show that the maximum likelihood estimator is unbiased. 

(b) Determine the variance of the maximum likelihood estimator. 


(c) Show that the asymptotic variance as given in Theorem 12.13 is 
the same as that developed in part (b). 


(d) Determine a simple formula for a confidence interval using (9.4) on 
page 276 that is based on replacing q with ĝ in the variance term. 


(e) Determine a more complicated formula for a confidence interval 
using (9.3) that is not based on such a replacement. This should 
be done in a manner similar to that used in Example 11.12 on page 
309. 


15For the binomial distribution, the usual problem that m must be an integer remains. 
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Table 12.15 Data for Exercise 12.96 


No. of claims No. of policies 


0 9,048 
1 905 
2 45 
3 2 
4+ 0 


12.95 Use (12.18) to determine the maximum likelihood estimator of 8 for the 
geometric distribution. In addition, determine the variance of the maximum 
likelihood estimator and verify that it matches the asymptotic variance as 
given in Theorem 12.13. 


12.96 A portfolio of 10,000 risks produced the claim counts in Table 12.15. 


(a) Determine the maximum likelihood estimate of A for a Poisson 
model and then determine a 95% confidence interval for À. 


(b) Determine the maximum likelihood estimate of @ for a geometric 
model and then determine a 95% confidence interval for p. 


(c) Determine the maximum likelihood estimate of r and £ for a neg- 
ative binomial model. 


(d) Assume that m = 4. Determine the maximum likelihood estimate 
of q of the binomial model. 


(e) Construct 95% confidence intervals for q using the methods devel- 
oped in parts (d) and (e) of Exercise 12.94. 


(f) Determine the maximum likelihood estimate of m and q by con- 
structing a likelihood profile. 


12.97 An automobile insurance policy provides benefits for accidents caused 
by both underinsured and uninsured motorists. Data on 1,000 policies re- 
vealed the information in Table 12.16. 


(a) Determine the maximum likelihood estimate of A for a Poisson 
model for each of the variables N, = number of underinsured claims 
and No = number of uninsured claims. 


(b) Assume that Nı and Nz are independent. Use Theorem 4.37 on 
page 74 to determine a model for N = Ni + No. 


12.98 An alternative method of obtaining a model for N in Exercise 12.97 
would be to record the total number of underinsured and uninsured claims 
for each of the 1,000 policies. Suppose this was done and the results were as 
in Table 12.17. 
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Table 12.16 Data for Exercise 12.97 


No. of claims Underinsured Uninsured 
0 901 947 
1 92 50 
2 5 2 
3 1 1 
4 1 0 
5+ 0 0 


Table 12.17 Data for Exercise 12.98 


No. of claims No. of policies 


0 861 
1 121 
2 13 
3 3 
4 1 
5 0 
6 1 
T+ 0 


(a) Determine the maximum likelihood estimate of \ for a Poisson 
model. 


(b) The answer to part (a) matched the answer to part (c) of the 
previous exercise. Demonstrate that this must always be so. 

(c) Determine the maximum likelihood estimate of 6 for a geometric 
model. 


(d) Determine the maximum likelihood estimate of r and f for a neg- 
ative binomial model. 


(e) Assume that m = 7. Determine the maximum likelihood estimate 
of q of the binomial model. 


(f) Determine the maximum likelihood estimates of m and g by con- 
structing a likelihood profile. 


12.99 The data in Table 12.18 represent the number of prescriptions filled in 
one year for a group of elderly members of a group insurance plan. 


(a) Determine the maximum likelihood estimate of A for a Poisson 
model. 


(b) Determine the maximum likelihood estimate of @ for a geometric 
model and then determine a 95% confidence interval for £. 
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Table 12.18 Data for Exercise 12.99 


Laca a tee ee a eo a aa ML 
No. of prescriptions Frequency No. of prescriptions Frequency 
0 82 16-20 40 
1-3 49 21-25 38 
4-6 47 26-35 52 
7-10 4T 36- 91 
11-15 57 


LO aae 


(c) Determine the maximum likelihood estimates of r and £ for a neg- 
ative binomial model. 


12.6 BIVARIATE MODELS 


12.6.1 Introduction 


_ At times a bivariate distribution with dependent variables is the appropriate 
model. One such situation is a joint life annuity or insurance. Here the timing 
of the payments depends on the first or second death of two individuals. 
Because these individuals are often related (typically spouses), the times of 
death will be dependent. As another example, in casualty insurances it is 
common to record the expenses that are directly related to the payment of 
the loss, referred to as the allocated loss adjustment expenses (ALAE). The 
loss and the ALAE are usually strongly positively correlated. 

There are a variety of sources for bivariate and multivariate models. Among 
them are the books by Hutchinson and Lai ({64]), Kotz, Balakrishnan, and 
Johnson ([80]), and Mardia ([89]). However, most of the distributions have 
marginal distributions that are not of interest for actuarial applications or 
have the parameters related in an unsuitable manner (for example, a bivariate 
gamma distribution in which both X and Y must have the same value for a). 
One exception is the bivariate lognormal distribution for which the logarithms 
of the two variables have a bivariate normal distribution. 

Of more interest and practical value are methods which construct bivariate 
models from known marginal distributions. For example, suppose it were 
known that losses have the Pareto distribution and that ALAE have the 
gamma distribution. Then those parameters could be estimated (and the 
models themselves determined) from the marginal data. Then they could be 
combined into a bivariate distribution that introduces a degree of association 
between the two variables. Among the methods available, the copula has re- 
ceived a lot of attention in the actuarial literature and is the only one that 
will be covered here. 
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Table 12.19 Twenty-four losses with ALAE 


CC 


Loss ALAE Loss ALAE 
1,500 301 11,750 2,530 
2,000 3,043 12,500 165 
2,500 415 14,000 175 
2,500 4,940 14,750 28,217 
4,500 395 15,000 2,072 
5,000 25 17,500 6,328 
5,750 34,474 19,833 212 
7,000 50 30,000 2,172 
7,000 10,593 33,033 7,845 
7,500 50 44,887 2,178 
9,000 406 62,500 12,251 

10,000 1,174 210,000 7,357 


12.6.2 Copulas 


Copula distributions are created using a function, also called a copula. This 
function must itself be a legitimate bivariate distribution function over the 
unit square with uniform marginals. Denote the two marginal distribution 
functions Fx(x) and Fy (y) and the copula function C(u,v). The bivariate 
distribution function created by the three is then © 


Fy y(z,y) = C[Fx(2), Fy (y)l. 


A simple but fairly useless example is the copula C(u,v) = uv. This creates 
the bivariate distribution function F'x,y (x,y) = Fx(x)Fy(y), which is true 
for independent variables. 

A good general introduction is [42] and an introduction for actuaries can 
be found in [40]. The paper by Frees, Carriere, and Valdez, [39] works with 
Frank’s copula for a study of joint lifetimes. An expanded version of the 
example presented here can be found in [78]. The last two cited papers show 
how to write the likelihood function under various truncations and censoring. 


Example 12.61 The loss and ALAE were recorded for each of 24 claims 
(Table 12.19). Determine a model for the joint distribution using Frank’s 
copula with Pareto distributions for both marginals. 


Frank’s copula is (where log, means the logarithm base a) 
(a= Die" = = 


a (12.28) 


C(u,v) = log, | + 


where the parameter a controls the degree of association between the two 
variables. Values of a less than 1 indicate a positive association, values greater 
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than 1 indicate an inverse association, and 1 indicates independence. If we let 
6 and @ be the parameters of the marginal Pareto distribution for X (where 
6 is the scale parameter) and let y and 7 be the parameters for Y (with 7 the 
scale parameter), the bivariate distribution function is 


[at (+2/0)~? Sa 1] [at-G+y/7)™ = 1] 
Pears bens hase a el 
(z,y) = log, + E 


Taking partial derivatives with respect to z and y provides the joint density 
function 


(a — 1) fla (48/0) —(a4y/)™? 


(1 +2/6)-P-1(1 +y/7) 7- na 
Fey) = T a Oe? — yal a 

Starting values for the four Pareto parameters were obtained by finding the 
maximum likelihood estimates for the two marginal distributions. Simplex 
maximization yields the estimates & = 0.133024, B = 2.59889, ĝ = 36,141.4, 
Ẹ = 0.759943, and 7 = 803.839. The positive association is apparent and could 
be tested. One way is to use the likelihood ratio test discussed in Chapter 13. 
It turns out that with the small sample size there is not sufficient evidence to 
‘be sure there is a positive association. o 


A number of results concerning Frank’s copula can be found in the paper 
by Genest [41]. Two are presented here. To simulate an (X, Y) pair, begin 
by simulating the X value from the marginal distribution. This can be done 
using the standard inversion technique. Follow this by simulating a value of 
Y from the conditional distribution of Y given X = x. To do this, first note 
that the distribution function is 


(0/8x)F(x,y) 


Fy x(ylz) = Fu 


For Frank’s copula we have 


að ð 
anf y) fx (o) C(u, v)lu=Fy(2)v=Fy (0) 


fx (z)a?x ®) [afr ) e 1] 
&— 1+ [a Fx (2) mi Ilja) = 1] ` 


To simulate a conditional value of Y using the inversion method discussed in 
Chapter 17, obtain a uniform(0,1) random number r and solve the equation 


afx (e) [afr 0) — 1] 
a-i+ [a-Fx (=) Ta YlaFr (9) z 1] 


=r 


| 
| 
| 
i 
i 
i 
| 
| 
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for Fy (y) to obtain 


monq e- 
al = 1+ FeO — rAr 
or 
1 r(a@—1) 
Few) = ey [l+ eI: 


The right-hand side is a number and then the distribution function of Y can 
be inverted to solve for the simulated value. 
The regression function can be found from 


E(Y|X = 2) 


[fh -Fretole)] ay 
ax (2) jar () —1] 
dy, 
Iv a—1+[oFs@) — aro —1 f 


but it is likely that the integral will have to be done numerically. 


Il 


12.6.3 Exercise 


12.100 Consider the data set in Table 12.19. Fit a bivariate distribution using 
Frank’s copula where each marginal distribution has the inverse exponential 
distribution. 


12.7 MODELS WITH COVARIATES 


12.7.1 Introduction 


It may be that the distribution of the random variable of interest depends 
on certain characteristics of the underlying situation. For example, the dis- 
tribution of time to death may be related to the individual’s age, gender, 
smoking status, blood pressure, height, and weight. Or, consider the number 
of automobile accidents a vehicle has in a year. The distribution of this vari- 
able might be related to the number of miles it is driven, where it is driven, 
and various characteristics of the primary driver such as age, gender, marital 
status, and driving history. 


Example 12.62 Suppose we believe that the distribution of the number of 
accidents a driver has in a year is related to the driver’s age and gender. 
Provide three approaches to modeling this situation. 


Of course there is no limit to the number of models that could be considered. 
Three that might be used are given below. 
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1. Construct a model for each combination of gender and age. Collect data 
separately for each combination and construct each model separately. 
Either parametric or data-dependent models could be selected. 


i} 


. Construct a single, fully parametric model for this situation. As an 
example, the number of accidents for a given driver could be assumed 
to have the Poisson distribution with parameter à. The value of À is 
then assumed to depend on the age x and the gender (g = 1 for males, 
g = 0 for females) in some way such as 


A= (ao +ayr+ az”) 69. 


3. Begin with a model for the density, distribution, or hazard rate function 
that is similar to a data-dependent model. Then use the age and gender 
to modify this function. For example, select a survival function So(n) 
and then the survival function for a particular driver might be 


S(n|x, g) = [So(n)]'oorrtoae 1° o 


While there is nothing wrong with the first approach, it is not very interest- 
ing. It just asks us to repeat the modeling process over and over as we move 
from one combination to another. The second approach is a single parametric 
model that can also be analyzed with techniques already discussed, but it 
is clearly more parsimonious. The third model’s hybrid nature implies that 
additional effort will be needed to implement it. 

The third model would be a good choice when there is no obvious distri- 
butional model for a given individual. In the case of automobile drivers, the 
Poisson distribution is a reasonable choice and so the second model may be 
the best approach. If the variable is time to death, a data-dependent model 
such as a life table may be appropriate. 

The advantage of the second and third approaches over the first one is that 
for some of the combinations there may be very few observations. In this 
case, the parsimony afforded by the second and third models may allow the 
limited information to still be useful. For example, suppose our task was to 
estimate the 80 entries in a life table running from age 20 through age 99 for 
four gender/smoker combinations. Using the ideas in model 1 above there are 
320 items to estimate. Using the ideas in model 3 there would be 83 items to 
estimate.!® 


16There would be 80 items needed to estimate the survival function for one of the four 
combinations. The other three combinations each add one more item, the power to which 
the survival function is to be raised. 
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12.7.2 Proportional hazards models 


A particular model that is relatively easy to work with is the Cox proportional 
hazards model. 


Definition 12.63 Given a baseline hazard rate function ho(t) and values 
Z1, -- -Zp associated with a particular individual, the Cox proportional haz- 
ards model for that person is given by the hazard rate function 


h(z|z) = ho(x)c(B 21 +--+ bpp) = ho(x)c(B* z), 


where c(y) is any function that takes on only positive values, z = (21,... 2p)" 
is a column vector of the z values (called covariates), and P = ((,,..-, Bp) 
is a column vector of coefficients. 


The only function that will be used here is c(y) = e”. One advantage of this 
function is that it must be positive. The name for this model is fitting because 
if the ratio of the hazard rate functions for two individuals is taken, the ratio 
will be constant. That is, one person’s hazard rate function is proportional to 
any other person’s hazard rate function. Our goal is to estimate the baseline 
hazard rate function ho(t) and the vector of coefficients £. 


Example 12.64 Suppose the size of a homeowner’s fire insurance claim as a 
percentage of the house’s value depends upon the age of the house and the type 
of construction (wood or brick). Develop a Cox proportional hazards model 
for this situation. Also, indicate the difference between wood and brick houses 
of the same age. 


Let zı = age (a nonnegative whole number) and z2 = 1 if the construction 
is wood and zo = 0 if the construction is brick. Then the hazard rate function 
‘for a given house is 


h(z|z1, 22) = ho (a)e9171 +8272, 


One consequence of this model is that, regardless of the age, the effect of 
switching from brick to wood is the same. For two houses of age zı we have 


wood (wt) = ho(z)ef17+82 = horier (2)e®?. 
The effect on the survival function is 
Swood(T) = exp -f wood) = exp -f horier ved 
0 


[Soricr (2) 7P O2. o 


The baseline hazard rate function can be estimated using either a para- 
metric model or a data-dependent model. The remainder of the model is 
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Table 12.20 Fire insurance payments 


Zy Z2 Payment 
10 0 70 
20 0 22 
30 0 90* 
40 0 81 
50 0 8 
10 1 51 
20 1 95* 
30 1 55 
40 1 85* 
50 1 93 


*The payment was made at the policy limit. 


parametric. In the spirit of this text, we will use maximum likelihood for 
estimation of 8, and a. We will begin with a fully parametric example. 


Example 12.65 For the fire insurance example, 10 payments are in Table 
12.20. All values are expressed as a percentage of the house’s value. Estimate 
the parameters of the Cox proportional hazards model using maximum likeli- 
hood and both an exponential and a beta distribution for the baseline hazard 
rate function. There is no deductible on these policies, but there is a policy 
limit (which differs by policy). 


In order to construct the likelihood function, we need the density and 
survival functions. Let cj; = exp(7z) be the Cox multiplier for the jth 
observation. Then, as noted in the previous example, S;(x) = So(x)%, where 
So(x) is the baseline distribution. The density function is 


HE) = —S;(x) = —c55o(z)*-*S9(z) 
¢7Sq(x)%~* fo(z). 
For the exponential distribution, 


S(x) = [e729]; = e™tiT/0 and f(z) = (4) eit /0 


and for the beta distribution, 


53 (zx) = [1 a pla, b; z)“, and 
Ba) = ofl -B(a,; aed — 2), 


where (a,b; x) is the distribution function for a beta distribution with pa- 
rameters a and b [available in Excel® as BETADIST(x,a,b)]. The gamma 
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function is available in Excel® as EXP(GAMMALN(a)). For policies with 
payments not at the limit, the contribution to the likelihood function is the 
density function while for those paid at the limit it is the survival function. 
In both cases, the likelihood function is sufficiently complex that it is not 
worth writing out. The parameter estimates for the exponential model are 
È, = 0.00319, By = —0.63722, and 6 = 0.74041. The value of the logarithm 
of the likelihood function is —6.1379. For the beta model, the estimates are 
B, = —0.00315, > = —0.77847, â = 1.03706, and b = 0.81442. The value 
of the logarithm of the likelihood function is —4.2155. Using the Schwarz 
Bayesian criterion (see Section 13.5.3), an improvement of In(10) /2 = 1.1513 
is needed to justify a fourth parameter. The beta model is preferred. If an 
estimate of the information matrix is desired, the only reasonable strategy is 
to take numerical derivatives of the loglikelihood. o 


An alternative is to construct a data-dependent model for the baseline 
hazard rate. Let R(y;) be the set of observations that are in the risk set for 
uncensored observation yj. Rather than obtain the true likelihood value, it 
is easier to obtain what is called the partial likelihood value. It is a conditional 
value. Rather than asking, “What is the probability of observing a value of 
yj?” we ask, “Given that it is known there is an uncensored observation of 
yj, what is the probability that it was the policy that had that value? Do 
this conditioned on equalling or exceeding that value.” This method allows 
us to estimate the 6 coefficients separately from the baseline hazard rate. 
Notation can become a bit awkward here. Let j* identify the observation 
that produced the uncensored observation of yj. Then the contribution to the 
likelihood function for that policy is 


Fi-(yg)/Si-(yj) j= Fos) Solus) a 
Dicer) Fily) Sily) P icR;) cifo(ys)/So(ys) iER(y;) Ci 


Example 12.66 Use the partial likelihood to estimate Bı and By. 


The ordered, uncensored values are 8, 22, 51, 55, 70, 81, and 93. The 
calculation of the contribution to the likelihood function is in Table 12.21. 

The product is maximized when ô, = —0.00373 and 3, = —0.91994 and 
the logarithm of the partial likelihood is —11.9889. When Bı is forced to 
be zero, the maximum is at Bo = —0.93708 and the logarithm of the partial 
likelihood is —11.9968. There is no evidence in this sample that age of the 
house has an impact when using this model. O 


Three issues remain. One is to estimate the baseline hazard rate function, 
one is to deal with the case where there are multiple observations at the same 


17 Recall from Section 11.1 that y1, y2,- -- represent the ordered, unique values from the set 
of uncensored observations. The risk set was also defined in that section. 
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Table 12.21 Fire insurance likelihood 


Value y c Contribution to L 
8 8 cı = exp(5081) EET 
22 22 co = exp(206,) aan 
51 51 c3 = exp(108, + Ao) ean 
55 55 c4 = exp(30f, + 22) eae 
70 70 C5 = exp(1081) tee 
81 81 ce = exp(408,) are STs 
85 c7 = exp(408, + b2) 

90 cg = exp(30f;) 

93 93 cg = exp(508, + £2) ara 

95 cio = exp(208; + Bo) 


value, and the final one is to estimate the variances of estimators. For the 
second problem, there are a number of approaches in the literature. The 
_question raised earlier could be rephrased as “Given that it is known there 
are s; uncensored observations of yj, what is the probability that it was the 
8; policies that actually had that value? Do this conditioned on equalling 
or exceeding that value.” A direct interpretation of this statement would 
have the numerator reflect the probability of the s; observations that were 
observed. The denominator would be based on all subsets of R(y;) with s; 
members. This is a lot of work. A simplified version due to Breslow treats 
each of the s; observations separately but, for the denominator, uses the same 
risk set for all of them. The effect is to require no change from the algorithm 
introduced above. 


Example 12.67 In the previous erample, suppose that the observation of 81 
had actually been 70. Give the contribution to the partial likelihood function 
for these two observations. 


Using the notation from that example, the contribution for the first obser- 
vation of 70 would still be cs/(cs+----+c10). However, the second observation 
of 70 would now contribute cg/(c5 +--+ c10). Note that the numerator has 
not changed (it is still cg); however, the denominator reflects the fact that 
there are six observations in R(70). O 


With regard to estimating the hazard rate function, we first note that the 
cumulative hazard rate function is 


H(t|z) = i‘ alae | holuledus Ee @)e. 


E | 
: 
7 
: 
f 
f 
: 
i 


MODELS WITH COVARIATES 411 


Table 12.22 Fire insurance baseline survival function 


Value y c Jump Poly) Sol) 
8 8 0.8300 gasmor rozs = 0.1597 0.1597 0.8524 
22 22 0.9282 gys roze = 0-1841 0.3438 0.7091 
51 51 0.3840 ssr rose = 0-2220 0.5658 0.5679 
55 55 0.3564 gss roseg = 0-2427 0.8086 0.4455 
70 70 0.9634 ssar rozs = 0-2657 1.0743 0.3415 
81 81 0.8615  gsssr rozes = 0-3572 1.4315 0.2390 
85 0.3434 

90 0.8942 

93 93 0.3308  gsmstoz = 14271 2.8586 0.0574 
95 0.3699 


To employ an analog of the Nelson—Aalen estimate, we use 
a S; 
t= Y, =. 

ape ieR(y;) Ci 


That is, the outer sum is taken over all uncensored observations less than or 
equal to t. The numerator is the number of observations having an uncensored 
value equal to y; and the denominator, rather than having the number in 
the risk set, adds their c values. As usual, the baseline survival function is 
estimated as 5o(¢) = exp[—Ho(t)]. 


Example 12.68 For. the continuing example (using the original values), es- 
timate the baseline survival function and then estimate the probability that 
a claim for a 35-year-old wood house will exceed 80% of the house’s value. 
Compare this to the value obtained from the beta distribution model obtained 
earlier. 


Using the estimates obtained earlier, the 10 c values are as given in Table 
12.22. Also included is the jump in the cumulative hazard estimate, followed 
by the estimate of the cumulative hazard function itself. Values for that 
function apply from the given y value up to, but not including, the next y 
value. 

For the house as described, c = exp{—0.00373(35) — 0.91994(1)] = 0.34977. 
The estimated probability is 0.3415°349"" = 0.68674. From the beta distrib- 
ution, ĝo(0.8) = 0.27732 and c = exp[—0.00315(35) — 0.77847(1)] = 0.41118, 
which gives an estimated probability of 0.27732°411!8 = 0.59015. Oo 
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With regard to variance estimates, the logarithm of the partial likelihood 
function is 
I(B) = D er: 


where the sum is taken over all nine that produced an uncensored 
value. Taking the first partial derivative with respect to 6, produces 


Dier) % 


rs) 1 cj» 1 ð 
—1(8) = S SS oe ý 
OB, ( ) 2 Cj* 0B, ieR(y;) Ci OB, ony 


To simplify this expression, note that 
Oc;- QeF12571tB2zj72 t+ +B p25 p 
6,708, 9 
g g 
where z;-, is the value of z, for subject j*. The derivative is 
ð ieR(y;) igi 
=—1(8) = 25-9 — — |. 
OB, 3 2 ieR(u;) ĉi 
The negative second partial derivative is 
82 
Bhaby 
=> Die R(y;) Zig ih (Seri) Zig) (Sets) zinci) 
= SS eee 
Ci 
iE R(ys) (Seas ci) 


Using the estimated values, these partial derivatives provide an estimate of 
the information matrix. 


I(B) 


Example 12.69 Obtain the information matriz and estimated covariance 
matrix for the continuing erample. Then use this to produce a 95% confi- 
dence interval for the relative risk of a wood house versus a brick house of the 
same age. 


Consider the entry in the outer sum for the observation with zı = 50 
and z2 = 1. The risk set contains this observation (with a value of 93 and 
c = 0.330802) and the censored observation with zı = 20 and z2 = 1 (witha 
value of 95 and c = 0.369924). For the derivative with respect to 8, and By 
the entry is 


50(1)(0.330802) + 20(1)(0.369924) 
0.330802 + 0.369924 
[50(0.330802) + 20(0.369924)][1(0.330802) + 1(0.369924)] 


(0.330802 + 0.369924)? = 0. 


i 
/ 
| 
| 
| 


aR naaraan ean mma sr AY PTE ANE 
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Summing such items and doing the same for the other partial derivatives yield 
the information matrix and its inverse, the covariance matrix. 


1171.054 n ~ le ae —0.00395 


A5 5.976519 1.322283 S —0.00395 0.774125 


The relative risk is the ratio of the c values for the two cases. For a house of 
age x, the relative risk of wood versus brick is e191+82/e718: = ef2, A 95% 
confidence interval for 2, is —0.91994 + 1.96./0.774125 or (—2.6444, 0.80455). 
Exponentiating the endpoints gives the confidence interval for the relative 
risk, (0.07105, 2.2357). 0 


12.7.3 The generalized linear and accelerated failure time models 


The proportional hazards model requires a particular relationship between 
survival functions. For actuarial purposes it may not be the most appropri- 
ate because it is difficult to interpret the meaning of multiplying a hazard 
rate function by a constant (or, equivalently, raising a survival function to 
a power).!§ It may be more useful to relate the covariates to a quantity of 
direct interest, such as the expected value. Linear models, such as the stan- 
dard multiple regression model are inadequate because they tend to rely on 
the normal distribution, a model not suitable for most phenomena, of interest 
to actuaries. The generalized linear model drops that restriction and so may 
be more useful. A comprehensive reference is [90] and actuarial papers using 
the model include [47], [61], [93], and [97]. The definition of this model given 
below is slightly more general than the usual one. 


Definition 12.70 Suppose a parametric distribution has parameters u and 
0, where u is the mean and @ is a vector of additional parameters. Let its cdf 


‘be F(x|u,0). The mean must not depend on the additional parameters and 


the additional parameters must not depend on the mean. Let z be a vector of 
covariates for an individual, let B be a vector of coefficients, and let n(1) and 
c(y) be functions. The generalized linear model then states that the random 
variable, X, has as its distribution function 


F(x|z, 80) = F(2|p, 8), 
where u is such that n(u) = e(B7z). 


The model indicates that the mean of an individual observation is related to 
the covariates through a particular set of functions. Normally, these functions 
do not involve parameters, but instead are used to provide a good fit or to 
ensure that only legitimate values of u are encountered. 


18 However, it is not uncommon in life insurance to incorporate a given health risk (such 
as obesity) by multiplying the values of gz by a constant. This is not much different from 
multiplying the hazard rate function by a constant. 
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Example 12.71 Demonstrate that the ordinary linear regression model is a 
special case of the generalized linear model. 


For ordinary linear regression, X has a normal distribution with u = p and 
8 =o. Both 7 and c are the identity function, resulting in u = G7 z. oO 


The model presented here is more general than the one usually used where 
only a few distributions are allowed for X. The reason is that, for these 
distributions, it has been possible to develop the full set of regression tools, 
such as residual analysis. Computer packages that implement the generalized 
‘linear model use only these distributions. 

For many of the distributions we have been using, the mean is not a para- 
meter. However, it could be. For example, we could parameterize the Pareto 
distribution by setting u = 0/(a—1) or, equivalently, replacing 6 with u(a—1). 
The distribution function is now 


F(a|p,a) =1—- weak 


Note the restriction on a in the parameter space. 


a 
| >; p>, a>t. 


Example 12.72 Construct a generalized linear model for the data set in Ez- 
ample 12.65 using a beta distribution for the loss model. 


The beta distribution as parameterized in Appendix A has a mean of p = 
a/(a+b). Let the other parameter be 6 = b. One way of linking the covariates 
to the mean is to use ņ(u) = u/(1— u) and c(87z) = exp(87z). Setting these 
equal and solving yields 

_ _exp(8*z) 
1+ exp(@7z) 
Solving the first two equations yields a = bu/(1—) = bexp(6"z). Maximum 
likelihood estimation proceeds by using a factor of f(x) for each uncensored 
observation and 1 — F(x) for each censored observation. For each observation, 
the beta distribution uses the parameter b directly, and the parameter a from 
the value of b and the covariates for that observation. Because there is no 
baseline distribution, the expression 37 z must include a constant term. Max- 
imizing the likelihood function yields the estimates 6 = 0.5775, Bo = 0.2130, 
By = 0.0018, and Bo = 1.0940. As in Example 12.65, the impact of age is 
negligible. One advantage of this model is that the mean is directly linked to 
the covariates. O 


A model that is similar in spirit to the generalized linear model is the 
accelerated failure time model as described below. 


Definition 12.73 The accelerated failure time model is defined from 


T 


S(z|z, B) = So(xe~® 7). (12.29) 
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Table 12.23 Data for Example 12.74 


Male (0) Female (1) 
Age 100 125 150 100 125 150 
50 13 12 85 3 12 49 
51 11 21 95 7 13 53 
52 8 8 105 8 13 69 
53 10 20 113 12 16 61 
54 8 11 109 12 15 60 
55 13 22 126 8 12 68 
56 19 16 142 11 11 96 
57 9 19 145 5 19 97 
58 17 23 155 5 17 93 
59 14 28 182 9 14 96 


To see that, provided the mean exists, it is a generalized linear model, first 
note that, (assuming S(0) = 1), 


E(X|z, 8) = A So(ze79"*) dx = n e™2So(y)dy = exp(8Tz)Eo(X), 


thus relating the mean of the distribution to the covariates. The name comes 
from the fact that the covariates effectively change the age. A person age 
x whose covariates are z has a future lifetime with the same distribution as 
a person for whom z = 0 and is age ze~9"2, If the baseline distribution 
has a scale parameter, then the effect of the covariates is to multiply that 
scale parameter by a constant. So, if 0 is the scale parameter for the baseline 


distribution then a person with covariate z will have the same distribution, 


but with scale parameter exp(@7z)9. Unlike the generalized linear model, it 
is not necessary for the mean to exist before this model can be used. 


Example 12.74 A mortality study at ages 50-59 included people of both gen- 
ders and with systolic blood pressure of 100, 125, or 150. For each of the 6 
combinations and at each of the 10 ages, 1,000 people were observed and the 
number of deaths recorded. The data appear in Table 12.23. Develop and 
estimate the parameters for an accelerated failure time model based on the 
Gompertz distribution. 


The Gompertz distribution has hazard rate function h(x) = Be” which 
implies a survival function of So(x) = exp[—-B(c” — 1)/Inc| as the baseline 
distribution. Let the covariates be zı = 0 for males and zı = 1 for females and 
let z2 be the blood pressure. For an individual insured, let y = exp(G,21 + 
22). The accelerated failure time model implies that 


S(aly) = So (z) = exp Ee 2 ; 
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Let c* = c!/7 and let B* = B/y. Then, 


PAED ay | Be), 


seh) = exp |- AE as 


and so the distribution remains Gompertz with new parameters as indicated. 
For each age, 


o Serin, [ Beee) 
ehars e e 


If there are d, deaths at age x, the contribution to the loglikelihood function 
(where a binomial distribution has been assumed for the number of deaths) is 


dz In gz + (1000 = dz) In(1 = qz). 


The likelihood function is maximized at B = 0.000243, c = 1.00866, 6, = 
0.110, and 6, = —0.0144. Being female multiplies the expected lifetime (from 
birth) by a factor of exp(0.110) = 1.116. An increase of 25 in blood pressure 
lowers the expected lifetime by 1 — exp[—25(0.0144)] = 0.302, or a 30.2% 
decrease. . O 


i 12.7.4 Exercises 


12.101 Suppose the 40 observations in Data Set D2 in Chapter 10 were from 


four types of policyholders. Observations 1, 5, ... are from male smokers, 
observations 2, 6, ... are from male nonsmokers, observations 3, 7, ... are 
from female smokers, and observations 4, 8, ... are from female nonsmokers. 


You are to construct a model for the time to surrender and then use the model 
to estimate the probability of surrendering in the first year for each of the four 
cases. Construct each of the following three models: 


(a) Use four different Nelson—Aalen estimates, keeping the four groups 
separate. 


(b) Use a proportional hazards model where the baseline distribution 
has the exponential distribution. 


(c) Use a proportional hazards model with an empirical estimate of 
the baseline distribution. 


12.102 (*) The duration of a strike follows a Cox proportional hazards model 
in which the baseline distribution has an exponential distribution. The only 
variable used is the index of industrial production. When the index has a 
value of 10, the mean duration of a strike is 0.2060 years. When the index has 
a value of 25, the median duration is 0.0411 years. Determine the probability 
that a strike will have a duration of more than one year if the index has a 
value of 5. 


oso mn 
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12.103 (*) A Cox proportional hazards model has z; = 1 for males and z1 = 0 
for females and z = 1 for adults and z2 = 0 for children. The maximum 
likelihood estimates of the coefficients are 8, = 0.25 and Bo = —0.45. The 
covariance matrix of the estimators is 


0.36 0.10 
0.10 0.20 |` 


Determine a 95% confidence interval for the relative risk of a male child sub ject 
compared to a female adult subject. 


12.104 (*) Four insureds were observed from birth to death. The two from 
Class A died at times 1 and 9 while the two from Class B died at times 2 and 
4. À proportional hazards model uses zı = 1 for Class B and 0 for Class A. 


Let b = pı. Estimate the cumulative hazard rate at time 3 for a member of 
Class A. 


12.105 (*) A Cox proportional hazards model has three covariates. The life 
that died first has values 1,0,0 for 21, 22,23. The second to die has values 
0,1,0 and the third to die has values 0,0, 1. Determine the partial likelihood 
function (as a function of 84, 82, and f3). 


12.106 Repeat Example 12.72 using only construction type and not age. 


12.107 Repeat Example 12.74 using a proportional hazards model with a 
Gompertz baseline distribution. 


12.108 Repeat Example 12.74 using an accelerated failure time model with 
a gamma baseline distribution. 


Model selection 


13.1 INTRODUCTION 


When using data to build a model, the process must end with the announce- 
ment of a “winner.” While qualifications, limitations, caveats, and other 
attempts to escape full responsibility are appropriate, and often necessary, a 
commitment to a solution is often required. In this chapter we look at a vari- 
ety of ways to evaluate a model and compare competing models. But we must 
also remember that whatever model we select it is only an approximation of 
teality. This is reflected in the following modeler’s motto!: 


All models are wrong, but some models are useful. 


Thus, our goal is to determine a model that is good enough to use to 
answer the question. The challenge here is that the definition of good enough 
will depend on the particular application. Another important modeling point 
is that a solid understanding of the question will guide you to the answer. 
The following quote from John Tukey [131], pp. 13-14 sums this up: 


Far better an approximate answer to the right question, which is often 
vague, than an exact answer to the wrong question, which can always 
be made precise. 


lit is usually attributed to George Box. 


Loss Models: From Data to Decisions, Second Edition. 
By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 
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In this chapter, a specific modeling strategy will be considered. Our pref- 
erence is to have a single approach that can be used for any probabilistic 
modeling situation. A consequence is that for any particular modeling situ- 
ation there may be a better (more reliable or more accurate) approach. For 
example, while maximum likelihood is a good estimation method for most 
settings, it may not be the best? for certain distributions. A literature search 
will turn up methods that have been optimized for specific distributions, but 
they will not be mentioned here. Similarly, many of the hypothesis tests used 
here give approximate results. For specific cases, better approximations, or 
maybe even exact results, are available. They will also be bypassed. The goal 
here is to outline a method that will give reasonable answers most of the time 
and be adaptable to a variety of situations. 

This chapter assumes the reader has a basic understanding of statistical 
hypothesis testing as reviewed in Chapter 9. The remaining sections cover 
a variety of evaluation and selection tools. Each tool has its own strengths 
and weaknesses, and it is possible for different tools to lead to different mod- 
els. This makes modeling as much art as science. At times, in real-world 
applications, the model’s purpose may lead the analyst to favor one tool over 
another. 


"13.2 REPRESENTATIONS OF THE DATA AND MODEL 


All the approaches to be presented attempt to compare the proposed model 
to the data or to another model. The proposed model is represented by 
either its density or distribution function or perhaps some functional of these 
quantities such as the limited expected value function or the mean residual life 
function. The data can be represented by the empirical distribution function 
or a histogram. The graphs are easy to construct when there is individual, 
complete data. When there is grouping or observations have been truncated or 
censored, difficulties arise. Here, the only cases to be covered are those where 
all the data have been truncated at the same value (which could be zero) 
and are all censored at the same value (which could be infinity). Extensions 
to the case of multiple truncation or censoring points are detailed in [109].3 
It should be noted that the need for such representations applies only to 
continuous models. For discrete data, issues of censoring, truncation, and 
grouping rarely apply. The data can easily be represented by the relative or 
cumulative frequencies at each possible observation. 


2There are many definitions of “best.” Combining the Cramér-Rao lower bound with 
Theorem 12.13 indicates that maximum likelihood estimators are asymptotically optimal 
using unbiasedness and minimum variance as the definition of best. 

3Because the Kaplan-Meier estimate can be used to represent data with multiple trunca- 
tion or censoring points, constructing graphical comparisons of the model and data is not 
difficult. The major challenge is generalizing the hypothesis tests to this situation. 
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Table 13.1 Data Set B with highest value changed 


27 82 115 126 155 161 243 294 340 384 
457 680 855 877 974 1,193 1,340 1,884 2,558 3,476 


Nt 


` With regard to representing the data, the empirical distribution function 
will be used for individual data and the histogram will be used for grouped 
data. 

In order to compare the model to truncated data, we begin by noting 
that the empirical distribution begins at the truncation point and represents 
conditional values (that is, they are the distribution and density function 
given that the observation exceeds the truncation point). In order to make a 
comparison to the empirical values, the model must also be truncated. Let 
the truncation point in the data set be t. The modified functions are 


0, j () z<t, 
F*(c) = F(x) — F(t i 

arg ah 

0, (2) z<t, 
F(z) = f(x 

I- F(t)’ g>t. 


In this chapter, when a distribution function or density function is indi- 
cated, a subscript equal to the sample size indicates that it is the empirical 
model (from Kaplan—Meier, Nelson—Aalen, the ogive, etc.) while no adorn- 
ment or the use of an asterisk (*), indicates the estimated parametric model. 
There is no notation for the true, underlying distribution because it is un- 
known and unknowable. 


13.3 GRAPHICAL COMPARISON OF THE DENSITY AND 
DISTRIBUTION FUNCTIONS 


The most direct way to see how well the model and data match up is to plot 
the respective density and distribution functions. 


Example 13.1 Consider Data Sets B and C. However, for this example and 
all that follow, in Data Set B replace the value at 15,743 with 3,476 (this is to 
allow the graphs to fit comfortably on a page). These data sets are reproduced 
here in Tables 13.1 and 13.2. Truncate Data Set B at 50 and Data Set C at 
7,500. Estimate the parameter of an exponential model for each data set. Plot 
the appropriate functions and comment on the quality of the fit of the model. 
Repeat this for Data Set B censored at 1,000 (without any truncation). 
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Table 13.2 Data Set C 


Payment range Number of payments 


0-7,500 99 
7,500-17,500 42 
17,500-32,500 29 
32,500-67,500 28 
67,500-125,000 17 
125,000-300,000 9 
Over 300,000 3 


Exponential fit 


—— Model 
— Empirical 


0 7 = 


0 700 1,400 2,100 2,800 3,500 


x 


Fig. 13.1 Model vs. data cdf plot for Data Set B truncated at 50. 


For Data Set B, there are 19 observations (the first observation is re- 
moved due to truncation). A typical contribution to the likelihood function 
is f(82)/[1 — F(50)]. The maximum likelihood estimate of the exponential 
parameter is 6 = 802.32. The empirical distribution function starts at 50 and 
jumps 1/19 at each data point. The distribution function, using a truncation 
point of 50, is 


1 — e72/802.32 _ Ge e~50/802.32) 
1 — (1 — e—50/802.32) 


—(z—50)/802.32 
F* (x) = = 1 — e7 (=-50)/ : 


Figure 13.1 presents a plot of these two functions. 
The fit is not as good as we might like because the model understates the 
distribution function at smaller values of x and overstates the distribution 


histogram bar is 
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Exponential fit 


0.000035 
0.00003 
0.000025 
0.00002 
0.000015 
0.00001 
0.000005 
0 


D> 
> 
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0 50,000 100,000 150,000 200,000 


x 


Fig. 13.2 Model vs. data density plot for Data Set C truncated at 7,500. 


function at larger values of x. This is not good because it means that tail 
probabilities are understated. 


For Data Set C, the likelihood function uses the truncated values. For 
example, the contribution to the likelihood function for the first interval is 


|= (17,500) — F(7,500) 49 


1 — F(7,500) 
The maximum likelihood estimate is ĝ = 44,253. The height of the first 


42 
128(17,500 — 7,500) ~ -0000328 


and the last bar is for the interval from 125,000 to 300,000 (a bar cannot be 


constructed for the interval from 300,000 to infinity). The density function 
must be truncated at 7,500 and becomes 


$ _ f(z) E 44, 25371 e77/44,253 
CO = THRs) I- eraa 
e~ (2—-7,500) /44,253 
= 44953. zx > 7,500. 

The plot of the density function versus the histogram is given Figure 13.2. 

The exponential model understates the early probabilities, It is hard to 
tell from the picture how the curves compare above 125,000. 
_ For Data Set B modified with a limit, the maximum likelihood estimate is 
6 = 718.00. When constructing the plot, the empirical distribution function 
must stop at 1,000. The plot appears in Figure 13.3. 
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Exponential fit 


— Model 
—— Empirical 


0 200 400 600 800 1,000 


Fig. 13.3 Model vs. data cdf plot for Data Set B censored at 1,000. 


Once again, the exponential model does not fit well. Oo 


When the model’s distribution function is close to the empirical distrib- 
ution function, it is difficult to make small distinctions. Among the many 
ways to amplify those distinctions, two will be presented here. The first is 
to simply plot the difference of the two functions. That is, if F,(z) is the 
empirical distribution function and F™ (x) is the model distribution function, 
plot D(z) = F,(x) — F*(z). 


Example 13.2 Plot D(x) for the previous example. 


For Data Set B truncated at 50, the plot appears in Figure 13.4. The lack 
of fit for this model is magnified in this plot. 

There is no corresponding plot for grouped data. For Data Set B censored 
at 1,000, the plot must again end at that value. It appears in Figure 13.5. 
The lack of fit continues to be apparent. Oo 


Another way to highlight any differences is the p-p plot, which is also 
called a probability plot. The plot is created by ordering the observations as 
zı < ++: <a. A point is then plotted corresponding to each value. The 
coordinates to plot are (F,(zj), F*(a;)).* If the model fits well, the plotted 


4In the first edition of this text this plot was incorrectly called a q-q plot. There is a plot 
that goes by that name, but it will not be introduced here. 
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Exponential fit 
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Fig. 13.4 Model vs. data D(z) plot for Data Set B truncated at 50. 


Exponential fit 


0 200 400 600 800 1,000 


Fig. 13.5 Model vs. data D(z) plot for Data Set B censored at 1,000. 


points will be near the 45° line running from (0,0) to (1,1). However, for this 
to be the case, a different definition of the empirical distribution function is 
needed. It can be shown that the expected value of Fn(z;) is j/(n + 1) and 
therefore the empirical distribution should be that value and not the usual 
j/n. If two observations have the same value, either plot both points (they 
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Exponential fit 


0 0.2 04 0.6 0.8 1 
F , (x) 


Fig. 13.6 p-p for Data Set B truncated at 50. 


. Ea have the same K value but different “z” values) or plot a single value 
y averaging the two “x” values. 


Example 13.3 Create a p-p plot for the continuing example. 


For Data Set B truncated at 50, n = 19 and one of the observed values is 
x = 82. The empirical value is F,,(82) = Æ = 0.05. The other coordinate is 


F* (82) = 1 — e~ (62—50)/802.32 _ 0,0391. 


One of the plotted points will be (0.05, 0.0391). Th l i 

PN ( ; ) e complete picture appears 
From the lower left part of the plot it is clear that the exponential model 

places less probability on small values than the data call for. A similar plot 

a constructed for Data Set B censored at 1,000 and it appears in Figure 
This plot ends at about 0.75 because that is the highest probability ob- 

served prior to the censoring point at 1,000. There are no empirical values 


at higher probabilities. Again, the exponential model tends to underestimate 
the empirical values. o 


13.3.1 Exercises 


as ooo Example 13.1 using a Weibull model in place of the exponential 
model. 


| 
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Exponential fit 


Fig. 13.7 p-p plot for Data Set B censored at 1,000. 


13.2 Repeat Example 13.2 for a Weibull model. 


13.3 Repeat Example 13.3 for a Weibull model. 


13.4 HYPOTHESIS TESTS 


A picture may be worth many words, but sometimes it is best to replace the 
impressions conveyed by pictures with mathematical demonstrations. One 
such demonstration is a test of the hypotheses 


Ho : The data came from a population with the stated model. 
H, : The data did not come from such a population. 


The test statistic is usually a measure of how close the model distribution 
function is to the empirical distribution function. When the null hypothesis 
completely specifies the model (for example, an exponential distribution with 
mean 100), critical values are well known. However, it is more often the case 
that the null hypothesis states the name of the model but not its parameters. 
When the parameters are estimated from the data, the test statistic tends to 
be smaller than it would have been had the parameter values been prespecified. 
That is because the estimation method itself tries to choose parameters that 
produce a distribution that is close to the data. In that case, the tests become 
approximate. Because rejection of the null hypothesis occurs for large values 
of the test statistic, the approximation tends to increase the probability of a 


| 
| 
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Type II error while lowering the probability of a Type I error. For actuarial 
modeling this is likely to be an acceptable trade-off. 

One method of avoiding the approximation is to randomly divide the sam- 
ple in half. Use one half to estimate the parameters and then use the other 
half to conduct the hypothesis test. Once the model is selected, the full data 
set could be used to reestimate the parameters. 


13.4.1 Kolmogorov—Smirnov test 


Let t be the left truncation point (¢ = 0 if there is no truncation) and let u 
be the right censoring point (u = oo if there is no censoring). Then, the test 
statistic is 


D= m |\F, (x) — F* (x)|. 


This test should only be used on individual data. This is to ensure that 
the step function F(x) is well defined. Also, the model distribution function 
F™ (x) is assumed to be continuous over the relevant range. 


Example 13.4 Calculate D for Example 13.1. 


Table 13.3 provides the needed values. Because the empirical distribution 

. function jumps at each data point, the model distribution function must be 

compared both before and after the jump. The values just before the jump 
are denoted F,,(z—) in the table. The maximum is D = 0.1340. 

For Data Set B censored at 1,000, 15 of the 20 observations are uncensored. 

Table 13.4 illustrates the needed calculations. The maximum is D = 0.0991.0 


All that remains is to determine the critical value. Commonly used critical 
values for this test are 1.22/,/n for a = 0.10, 1.36/./n for a = 0.05, and 
1.63/./n for a = 0.01. When u < ov, the critical value should be smaller 
because there is less opportunity for the difference to become large. Modifica- 
tions for this phenomenon exist in the literature (see [125], for example, which 
also includes tables of critical values for specific null distribution models), and 
one such modification is given in [109] but will not be introduced here. 


Example 13.5 Complete the Kolmogorov-Smirnov test for the previous ex- 
ample. 


For Data Set B truncated at 50 the sample size is 19. The critical value at 
a 5% significance level is 1.36//19 = 0.3120. Because 0.1340 < 0.3120, the 
null hypothesis is not rejected and the exponential distribution is a plausible 


5 Among the tests presented here, only the chi-square test has a built-in correction for this 
situation. Modifications for the other tests have been developed, but they will not be 
presented here. 
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Table 13.3 Calculation of D for Example 13.4 


Maximum 
x F*(z) F,(t—) Fals) difference 
82 0.0391 0.0000 0.0526 0.0391 
115 0.0778 0.0526 0.1053 0.0275 
126 0.0904 0.1053 0.1579 0.0675 
155 0.1227 0.1579 0.2105 0.0878 
161 0.1292 0.2105 0.2632 0.1340 
243 0.2138 0.2632 0.3158 0.1020 
294 0.2622 0.3158 0.3684 0.1062 
340 0.3033 0.3684 0.4211 0.1178 
384 0.3405 0.4211 0.4737 0.1332 
457 0.3979 0.4737 0.5263 0.1284 
680 0.5440 0.5263 0.5789 0.0349 
855 0.6333 0.5789 `” 0.6316 0.0544 
877 0.6433 0.6316 0.6842 0.0409 
974 0.6839 0.6842 0.7368 0.0529 
1,193 0.7594 0.7368 0.7895 0.0301 
1,340 0.7997 0.7895 0.8421 0.0424 
1,884 0.8983 0.8421 0.8947 0.0562 
2,558 0.9561 0.8947 ` 0.9474 0.0614 


3,476 0.9860 0.9474 1.0000 0.0386 


model. While it is unlikely that the exponential model is appropriate for 
this population, the sample size is too small to lead to that conclusion. For 
Data Set B censored at 1,000 the sample size is 20 and so the critical value 
is 1.36/ /20 = 0.3041 and the exponential model is again viewed as being 
plausible. o 


For both this test and the Anderson—Darling test that follows, the criti- 
cal values are correct only when the null hypothesis completely specifies the 
model. When the data set is used to estimate parameters for the null hypoth- 
esized distribution (as in the example), the correct critical value is smaller. 
For both tests, the change depends on the particular distribution that is hy- 
pothesized and maybe even on the particular true values of the parameters. 
An indication of how simulation can be used for this situation is presented in 
Section 17.2.4. 
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Table 13.4 Calculation of D for Example 13.4 with censoring 


Maximum 
x F* (x) F,(a—) F(z) difference 
27 0.0369 0.00 0.05 0.0369 ° 

82 0.1079 0.05 0.10 0.0579 
115 0.1480 0.10 0.15 0.0480 
126 0.1610 0.15 0.20 0.0390 
155 0.1942 0.20 0.25 0.0558 
161 0.2009 0.25 0.30 0.0991 
“243 0.2871 0.30 0.35 0.0629 
294 0.3360 0.35 0.40 0.0640 
340 0.3772 0.40 f 0.45 0.0728 
384 0.4142 0.45 0.50 0.0858 
457 0.4709 0.50 0.55 0.0791 
680 0.6121 0.55 0.60 0.0621 
855 0.6960 0.60 ` 0.65 0.0960 
877 0.7052 0.65 0.70 0.0552 
974 0.7425 0.70 0.75 0.0425 
1000 0.7516 0.75 0.75 0.0016 


13.4.2 Anderson—Darling test 


This test is similar to the Kolmogorov-Smirnov test, but uses a different 
measure of the difference between the two distribution functions. The test 


statistic is IF, (2) (x)? 
2 n(x) — F* (1) ay 
=n PO FE O” 

That is, it is a weighted average of the squared differences between the em- 
pirical and model distribution functions. Note that when z is close to t or to 
u the weights might be very large due to the small value of one of the factors 
in the denominator. This test statistic tends to place more emphasis on good 
fit in the tails than in the middle of the distribution. Calculating with this 
formula appears to be challenging. However, for individual data (so this is 
another test that does not work for grouped data), the integral simplifies to 


: 
AP = —nF*(u) +n J N- Falu) Pif — E*l) — nfl — F” (jaa) 
j=0 


k 
+n)” Faly) ln F*(yj41) — In F* (y), 
j=l : 


where the unique noncensored data points are t = yp < yy < --- < Yk < 
Yk+ı = u. Note that when u = oo the last term of the first sum is zero 


Sid 
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Table 13.5 Anderson—Darling test for Example 13.6 


j Yj F*(x) Falz) Summand 
0 50 0.0000 0.0000 0.0399 
1 82 0.0391 0.0526 0.0388 
2 115 0.0778 0.1053 0.0126 
3 126 0.0904 0.1579 0.0332 
4 155 0.1227 0.2105 0.0070 
5 161 0.1292 0.2632 0.0904 
6 243 0.2138 0.3158 0.0501 
7 294 0.2622 0.3684 0.0426 
8 340 0.3033 0.4211 0.0389 
9 384 0.3405 0.4737 0.0601 
10 457 0.3979 0.5263 0.1490 
11 680 0.5440 0.5789 0.0897 
12 855 0.6333 0.6316 0.0099 
13 877 0.6433 0.6842 0.0407 
14 974 0.6839 0.7368 0.0758 
15 1,193 0.7594 0.7895 0.0403 
16 1,340 0.7997 0.8421 0.0994 
17 1,884 0.8983 0.8947 0.0592 
18 2,558 0.9561 0.9474 0.0308 
19 3,476 0.9860 1.0000 0.0141 
20 fore) 1.0000 1.0000 


[evaluating the formula as written will ask for In(0)|. The critical values are 
1:933, 2.492, and 3.857 for 10, 5, and 1% significance levels, respectively. As 
with the Kolmogorov-Smirnov test, the critical value should be smaller when 
u < 00. 


Example 13.6 Perform the Anderson-Darling test for the continuing ezam- 
ple. 


For Data Set B truncated at 50, there are 19 data points. The calculation 
is in Table 13.5, where “summand” refers to the sum of the corresponding 
terms from the two sums. The total is 1.0226 and the test statistic is —19(1) + 
19(1.0226) = 0.4292. Because the test statistic is less than the critical value 
of 2.492, the exponential model is viewed as plausible. 

For Data Set B censored at 1,000, the results are in Table 13.6. The total 
is 0.7602 and the test statistic is -20(0.7516) + 20(0.7602) = 0.1713. Because 
the test statistic does not exceed the critical value of 2.492, the exponential 
model is viewed as plausible. d 
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Table 13.6 Anderson—Darling calculation for Example 13.6 with censored data 


j Yj F* (2) F(x) Summand 
0 0 0.0000 0.00 0.0376 
1 27 0.0369 0.05 0.0718 
2 82 0.1079 0.10 0.0404 
3 115 0.1480 0.15 0.0130 
4 126 0.1610 0.20 0.0334 
5 155 0.1942 0.25 0.0068 
6 161 0.2009 0.30 0.0881 
7 243 0.2871 0.35 0.0493 
8 294 0.3360 0.40 0.0416 
9 340 0.3772 0.45 0.0375 
10 384 0.4142 0.50 0.0575 
11 457 0.4709 0.55 0.1423 
12 680 0.6121 0.60 0.0852 
13 855 0.6960 0.65 0.0093 
14 877 0.7052 0.70 0.0374 
15 974 0.7425 0.75 0.0092 
16 1000 0.7516 0.75 


13.4.3 Chi-square goodness-of-fit test 


Unlike the previous two tests, this test allows for some discretion. It begins 
oe the selection of k — 1 arbitrary values, t = cp < cı < --- < cy = 00. Let 

= F*(c;) — F*(cj~1) be the probability a truncated e falls in the 
ae from cj-1 to cj. Similarly, let paj = F,(c;) — Fn(ej-1) be the same 
probability according to the empirical distribution. The test statistic is then 


E = =P 
Jj n(Bj — Png) 
aa er a 


where n is the sample size. Another way to write the formula is to let Ej = np; 
be the number of expected observations in the interval (assuming that the 
hypothesized model is true) and O; = npn; be the number of observations in 
the interval. Then, 
k 2 
x? ae X (E; aoe : 
j=l 1 

The critical value for this test comes from the chi-square distribution with 
degrees of freedom equal to the number of terms in the sum (k) minus 1 minus 
the number of estimated parameters. There are a number of rules that have 
been proposed for deciding when the test is reasonably accurate. They center 
around the values of E; = np;. The most conservative states that each must 
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Table 13.7 Data Set B truncated at 50 


Range p Expected Observed x? 
50-150 0.1172 2.227 3 0.2687 
150-250 0.1035 1.966 3 0.5444 
250-500 0.2087 3.964 4 0.0003 
500-1,000 0.2647 5.029 4 0.2105 
1,000-2,000 0.2180 4.143 3 0.3152 
2,000—00 0.0880 1.672 2 0.0644 


Total 1 19 19 1.4034 


be at least 5. Some authors claim that values as low as 1 are acceptable. All 
agree the test works best when the values are about equal from term to term. 
If the data are grouped, there is little choice but to use the groups as given, 
although adjacent groups could be combined to increase Ej. For individual 
data, the data can be grouped for the purpose of performing this test.° 


Example 13.7 Perform the chi-square goodness-of-fit test for the exponential 
distribution for the continuing example. 


All three data sets can be evaluated with this test. For Data Set B trun- 
cated at 50, establish boundaries at 50, 150, 250, 500, 1000, 2000, and infinity. 
The calculations appear in Table 13.7. The total is y? = 1.4034. With four 
degrees of freedom (6 rows minus 1 minus 1 estimated parameter) the criti- 
cal value for a test at a 5% significance level is 9.4877 (this can be obtained 
with the Excel® function CHIINV(.05,4)) and the p-value is 0.8436 [from 
CHIDIST(1.4034,4)]. The exponential model is a good fit. 

For Data Set B censored at 1,000, the first interval is from 0-150 and the 
last interval is from 1,000-00. Unlike the previous two tests, the censored 
observations can be used. The calculations are in Table 13.8. The total is 
x? = 0.5951. With three degrees of freedom (5 rows minus 1 minus 1 estimated 
parameter) the critical value for a test at a 5% significance level is 7.8147 and 
the p-value is 0.8976. The exponential model is a good fit. 

For Data Set C the groups are already in place. The results are given Table 
13.9. The test statistic is y? = 61.913. There are four degrees of freedom for 
a critical value of 9.488. The p-value is about 10712. There is clear evidence 
that the exponential model is not appropriate. A more accurate test would 


® Moore [95] cites a number of rules. Among them are (1) An expected frequency of at least 
1 for all cells and and an expected frequency of at least 5 for 80% of the cells; (2) an average 
count per cell of at least 4 when testing at the 1% significance level and an average count 
of at least 2 when testing at the 5% significance level; and (3) A sample size of at least 10, 
at least 3 cells, and the ratio of the square of the sample size to the number of cells at least 
10. 


| 
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Table 13.8 Data Set B censored at 1,000 


Range P Expected Observed xX 
0-150 0.1885 3.771 4 0.0139 
150-250 0.1055 2.110 3 0.3754 
250-500 0.2076 4.152 4 0.0055 
500-—1,000 0.2500 5.000 4 0.2000 
1,000—co 0.2484 4,968 5 0.0002 
Total 1 20 20 0.5951 


Table 13.9 Data Set C 


Range ô Expected Observed x? 
7,500-17,500 0.2023 25.889 42 10.026 
17,500—32,500 0.2293 29.356 29 0.004 
32,500-67,500 0.3107 39.765 28 3.481 
67,500—125,000 0.1874 23.993 17 2.038 
125,000-300,000 . 0.0689 8.824 9 0.003 
300,000—co 0.0013 0.172 3 46.360 
Total 1 128 128 61.913 


combine the last two groups (because the expected count in the last group is 
less than 1). The group from 125,000 to infinity has an expected count of 8.997 
and an observed count of 12 for a contribution of 1.002. The test statistic is 
now 16.552 and with three degrees of freedom the p-value is 0.00087. The test 
continues to reject the exponential model. O 


Sometimes, the test can be modified to fit different situations. The follow- 
ing example illustrates this for aggregate frequency data. 


Example 13.8 Conduct an approximate goodness-of-fit test for the Poisson 
model determined in Example 12.60. The data are repeated in Table 13.10. 


For each year we are assuming that the number of claims is the result of 
the sum of a number (given by the exposure) of independent and identical 
random variables. In that case the central limit theorem indicates that a 
normal approximation may be appropriate. The expected count (Ep) is the 
exposure times the estimated expected value for one exposure unit while the 
variance (V;,) is the exposure times the estimated variance for one exposure 
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Table 13.10 Automobile claims by year 


Year Exposure Claims 


1986 2,145 207 


1987 2,452 227 
1988 3,112 341 
1989 3,458 335 
1990 3,698 362 
1991 3,872 359 


unit. The test statistic is then 
(nx — Ex)? 
On 


and has an approximate chi-square distribution with degrees of freedom equal 
to the number of data points less the number of estimated parameters. The 
expected count is Ep = Ae, and the variance is Vp = Ae, also. The test 
statistic is 


(207 — 209.61)? (227 — 239.61)? , (341 — 304.11)? 


QQ = 209.61 239.61 304.11 
(335 — 337.92)? (362 — 361.37)? (359 — 378.38)? 
337.92 361.37 ! 378.38 
= 6.19. 


With five degrees of freedom, the 5% critical value is 11.07 and the Poisson 
hypothesis is accepted. a 


There is one important point to note about these tests. Suppose the sample 
size were to double but sampled values were not much different (imagine each 
number showing up twice instead of once). For the Kolmogorov-Smirnov 
test, the test statistic would be unchanged, but the critical value would be 
smaller. For the Anderson—Darling and chi-square tests, the test statistic 
would double while the critical value would be unchanged. As a result, for 
larger sample sizes, it is more likely that the null hypothesis (and thus the 
proposed model) will be rejected. This should not be surprising. We know that 
the null hypothesis is false (it is extremely unlikely that a simple distribution 
using a few parameters can explain the complex behavior that produced the 
observations) and with a large enough sample size we will have convincing 
evidence of that truth. When using these tests we must remember that, 
although all our models are wrong, some may be useful. 
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13.4.4 Likelihood ratio test 


An alternative question to “Could the population have distribution A?”- is 


“Is the population more likely to have distribution B than distribution A?” 
More formally: 


Ho : The data came from a population with distribution A. 
Hı : The data came from a population with distribution B. 


In order to perform a formal hypothesis test distribution A must be a special 
case of distribution B, for example, exponential versus gamma. An easy way 
to complete this test is given below. 


Definition 13.9 The likelihood ratio test is conducted as follows. First, let 
the likelihood function be written as L(@). Let @o be the value of the parameters 
that maximizes the likelihood function. However, only values of the parameters 
that are within the null hypothesis may be considered. Let Lo = L(@9). Let 
0, be the maximum likelihood estimator where the parameters can vary over 
all possible values from the alternative hypothesis and then let Lı = L(0;). 
The test statistic is T = 2ln(L1/Lo) = 2(ln Ly — ln Lo). The null hypothesis 
is rejected if T > c, where c is calculated from a = Pr(T > c), where T has 
a chi-square distribution with degrees of freedom equal to the number of free 
parameters in the model from the alternative hypothesis less the number of 
free parameters in the model from the null hypothesis. 


This test makes some sense. When the alternative hypothesis is true, forc- 
ing the parameter to be selected from the null hypothesis should produce a 
likelihood value that is significantly smaller. 


Example 13.10 You want to test the hypothesis that the population that pro- 
duced Data Set B (using the original largest observation) has a mean that is 
other than 1,200. Assume that the population has a gamma distribution and 


conduct the likelihood ratio test at a 5% significance level. Also, determine 
the p-value. 


The hypotheses are: 


Ho : gamma with p = 1,200. 
H, : gamma with u Æ 1,200. 


From earlier work the maximum likelihood estimates are @ = 0.55616 and 
0 = 2,561.1. The loglikelihood at the maximum is In Ly = —162.293. Next, 
the likelihood must be maximized, but only over those values a and @ for 
which œf = 1,200. That means a can be free to range over all positive 
numbers but ð = 1,200/a. Thus, under the null hypothesis, there is only 
one free parameter. The likelihood function is maximized at & = 0.54955 
and 0 = 2,183.6. The loglikelihood at this maximum is In Lọ = —162.466. 
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Table 13.11 Six useful models for Example 13.11 


Number of Negative 


Model parameters loglikelihood x? p-value 
Negative binomial 2 5,348.04 8.77 0.0125 
ZM logarithmic 2 5,343.79 4.92 0.1779 
Poisson—inverse Gaussian 2 5,343.51 4.54 0.2091 
ZM negative binomial 3 5,343.62 4.65 0.0979 
Geometric-negative binomial 3 5,342.70 1.96 0.38754 
Poisson—-ETNB 3 5,342.51 2.75 0.2525 


The test statistic is T = 2(—162.293 + 162.466) = 0.346. For a chi-square 
distribution with one degree of freedom, the critical value is 3.8415. Because 
0.346 < 3.8415, the null hypothesis is not rejected. The probability that a 
chi-square random variable with one degree of freedom exceeds 0.346 is 0.556, 
a p-value that indicates little support for the alternative hypothesis. d 


Example 13.11 (Example 4.42 continued) Members of the (a, b, 0) class were 
not sufficient to describe these data. Determine a suitable model. 


Thirteen different distributions were fit to the data. The results of that 
process revealed six models with p-values above 0.01 for the chi-square good- 
ness-of-fit test. Information about those models is given in Table 13.11. The 
likelihood ratio test indicates that the three-parameter model with the small- 
est negative loglikelihood (Poisson~ETNB) is not significantly better than the 
two-parameter Poisson—inverse Gaussian model. The latter appears to be an 
excellent choice. d 


Example 13.12 The estimated value of pı in Example 12.65 is small. Per- 
form a likelihood ratio test using the beta model to see if age has a significant 
impact on losses. 


The parameters are reestimated forcing 84 to be zero. When this is done, 
the estimates are 8, = —0.79193, â = 1.03118, and b = 0.74249. The value of 
the logarithm of the likelihood function is —4.2209. Adding age improves the 
likelihood by 0.0054, which is not significant. o 


It is tempting to use this test when the alternative distribution simply has 
more parameters than the null distribution. In such cases the test is not 
appropriate. For example, it is possible for a two-parameter lognormal model 
to have a higher loglikelihood value than a three-parameter Burr model. This 
produces a negative test statistic, indicating that a chi-square distribution is 
not appropriate. When the null distribution is a limiting (rather than special) 
case of the alternative distribution, the test may still be used, but the test 
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statistic’s distribution is now a mixture of chi-square distributions (see [120]). 
Regardless, it is still reasonable to use the “test” to make decisions in these 
cases, provided it is clearly understood that a formal hypothesis test was not 
conducted. Further examples and exercises using this test to make decisions 
appear in both the next section and the next chapter. 


13.4.5 Exercises 


13.4 Use the Kolmogorov-Smirnov test to see if a Weibull model is appro- 
priate for the data used in Example 13.5. 


13.5 (*) Five observations are made from a random variable. They are 1, 2, 
3, 5, and 13. Determine the value of the Kolmogorov-Smirnov test statistic 
for the null hypothesis that f(x) = 22~?e7?/*, x > 0. 


13.6 (*) You are given the following five observations from a random sample: 
0.1, 0.2, 0.5, 1.0, and 1.3. Calculate the Kolmogorov—Smirnov test statistic for 
the null hypothesis that the population density function is f(x) = 2(1 +r), 
z>0. 


13.7 Perform the Anderson—Darling test of the Weibull distribution for Ex- 
. ample 13.6. 


13.8 Repeat Example 13.7 for the Weibull model. 


13.9 (*) One hundred and fifty policyholders were observed from the time 
they arranged a viatical settlement until their death. No observations were 
censored. There were 21 deaths in the first year, 27 deaths in the second year, 
39 deaths in the third year, and 63 deaths in the fourth year. The survival 
model 

t(t +1) 

20 

is being considered. At a 5% significance level, conduct the chi-square goodness- 
of-fit test. 


S(t) =1- 


,O<t<4, 


13.10 (*) Each day, for 365 days, the number of claims is recorded. The 
results were 50 days with no claims, 122 days with one claim, 101 days with 
two claims, 92 days with three claims, and no days with four or more claims. 
For a Poisson model determine the maximum likelihood estimate of \ and 
then perform the chi-square goodness-of-fit test at a 2.5% significance level. 


13.11 (*) During a one-year period, the number of accidents per day was 
distributed as given in Table 13.12. Test the hypothesis that the data are 
from a Poisson distribution with mean 0.6 using the maximum number of 
groups such that each group has at least five expected observations. Use a 
significance level of 5%. 


1 nme tment eee 
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Table 13.12 Data for Exercise 13.11 


No. of accidents Days 


209 
111 
33 
7 

3 

2 


ae WN eH © 


13.12 Redo Example 13.8 assuming that each exposure unit has a geometric 
distribution. Conduct the approximate chi-square goodness-of-fit test. Is the 
geometric preferable to the Poisson model? 


13.13 Using Data Set B (with the original largest value), determine if a 
gamma model is more appropriate than an exponential model. Recall that an 
exponential model is a gamma model with a = 1. Useful values were obtained 
in Example 12.8. 


13.14 Use Data Set C to choose a model for the population that produced 
those numbers. Choose from the exponential, gamma, and transformed gamma 
models. Information for the first two distributions was obtained in Example 
12.9 and Exercise 12.21, respectively. 


13.15 Conduct the chi-square goodness-of-fit test for each of the models ob- 
tained in Exercise 12.96. 


13.16 Conduct the chi-square goodness-of-fit test for each of the models ob- 
tained in Exercise 12.98. 


13.17 Conduct the chi-square goodness-of-fit test for each of the models ob- 
tained in Exercise 12.99. 


13.18 For the data in Table 13.20 determine the method of moments esti- 
mates of the parameters of the Poisson—Poisson distribution where the sec- 
ondary distribution is the ordinary (not zero-truncated) Poisson distribution. 
Perform the chi-square goodness-of-fit test using this model. 


13.19 You are given the data in Table 13.13 which represent results from 
23,589 automobile insurance policies. The third column headed “fitted model” 
represents the expected number of losses for a fitted (by maximum likelihood) 
negative binomial distribution. 


(a) Perform the chi-squared goodness-of-fit test at a significance level 
of 5%. 
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Table 13.13 Data for Exercise 13.19 


Number of Number of Fitted | 
losses, k policies, n; model 
0 20,592 20,596.76 
1 2,651 2,631.03 
2 297 318.37 
3 41 37.81 
4 7 4.45 
5 0 0.52 
6 1 0.06 
>7 0 0.00 


(b) Determine the maximum likelihood estimates of the negative bino- 
mial parameters r and 8. This can be done from the given numbers 
without actually maximizing the likelihood function. 


13.5 SELECTING A MODEL 


13.5.1 Introduction 


Almost all of the tools are in place for choosing a model. Before outlining 
a recommended approach, two important concepts must be introduced. The 
first is parsimony. The principle of parsimony states that unless there is 
considerable evidence to do otherwise a simpler model is preferred. The reason 
is that a complex model may do a great job of matching the data, but that is 
no guarantee the model matches the population from which the observations 
were sampled. For example, given any set of 10 (x,y) pairs with unique x 
values, there will always be a polynomial of degree 9 or less that goes through 
all 10 points. But if these points were a random sample, it is highly unlikely 
that the population values all lie on that polynomial. However, there may 
be a straight line that comes close to the sampled points as well as the other 
points in the population. This matches the spirit of most hypothesis tests. 
That is, do not reject the null hypothesis (and thus claim a more complex 
description of the population holds) unless there is strong evidence to do so. 

The second concept does not have a name. It states that, if you try enough 
models, one will look good, even if it is not. Suppose I have 900 models at 
my disposal. For most data sets, it is likely that one of them will fit well, but 
this does not help us learn about the population. - 

Thus, in selecting models, there are two things to keep in mind: 


1. Use a simple model if at all possible. 


| 
| 
| 
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2. Restrict the universe of potential models. 


The methods outlined in the remainder of this section will help with the first 
point. The second one requires some experience. Certain models make more 
sense in certain situations, but only experience can enhance the modeler’s 
senses so that only a short list of quality candidates is considered. 
` The section is split into two types of selection criteria. The first set is based 
on the modeler’s judgment while the second set is more formal in the sense 
that most of the time all analysts will reach the same conclusions. That is 
because the decisions are made based on numerical measurements rather than 
charts or graphs. 


13.5.2 Judgment-based approaches 


Using one’s own judgment to select models involves one or more of the three 
concepts outlined below. In all cases, the analyst’s experience is critical. 

First, the decision can be based on the various graphs (or tables based on 
the graphs) presented in this chapter.’ This allows the analyst to focus on 
aspects of the model that are important for the proposed application. For 
example, it may be more important to fit the tail well or it may be more 
important to match the mode or modes. Even if a score-based approach is 
used, it may be appropriate to present a convincing picture to support the 
chosen model. 

Second, the decision can be influenced by the success of particular models 
in similar situations or the value of a particular model for its intended use. 
For example, the 1941 CSO mortality table follows a Makeham distribution 
for much of its range of ages. In a time of limited computing power, such a 
distribution allowed for easier calculation of joint life values. As long as the fit 
‘of this model was reasonable, this advantage outweighed the use of a different, 
but better fitting, model. Similarly, if the Pareto distribution has been used to 
model a particular line of liability insurance both by the analyst’s company 
and by others, it may require more than the usual amount of evidence to 
change to an alternative distribution. 

Third, the situation may completely determine the distribution. For exam- 
ple, suppose a dental insurance contract provides for at most two check-ups 
per year and suppose that individuals make two independent choices each year 
as to whether or not to have a check-up. If each time the probability is q, 
then the distribution must be binomial with m = 2. 

Finally, it should be noted that the more algorithmic approaches outlined 
below do not always agree. In that case judgment is most definitely required, 
if only to decide which algorithmic approach to use. 


7Besides the ones discussed here, there are other plots/tables that could be used. Other 
choices are a q-q plot and a comparison of model and empirical limited expected values or 
mean residual life functions. 
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13.5.3 Score-based approaches 


Some analysts might prefer an automated process for selecting a model.. An 
easy way to do that would be to assign a score to each model and the model 
with the best value wins. The following scores are worth considering: 


1. Lowest value of the Kolmogorov-Smirnov test statistic. 
. Lowest value of the Anderson—Darling test statistic. 


. Lowest value of the chi-square goodness-of-fit test statistic. 


A Ww N 


. Highest p-value for the chi-square goodness-of-fit test. 


5. Highest value of the likelihood function at its maximum. 


All but the chi-square p-value have a deficiency with respect to parsimony. 
First, consider the likelihood function. When comparing, say, an exponential 
to a Weibull model, the Weibull model must have a likelihood value that is at 
least as large as the exponential model. They would only be equal in the rare 
case that the maximum likelihood estimate of the Weibull parameter 7 is equal 
to 1. Thus, the Weibull model would always win over the exponential model, 
a clear violation of the principle of parsimony. For the three test statistics, 

` there is no assurance that the same relationship will hold, but it seems likely 
that, if a more complex model is selected, the fit measure is likely to be better. 
The only reason the p-value is immune from this problem is that with more 
complex models the test has fewer degrees of freedom. It is then possible that 
the more complex model will have a smaller p-value. There is no comparable 
adjustment for the first two test statistics listed. 

With regard to the likelihood value, there are two ways to proceed. One is 
to perform the likelihood ratio test and the other is to extract a penalty for 
employing additional parameters. The likelihood ratio test is technically only 
available when one model is a special case of another (for example, Pareto vs 
generalized Pareto). The concept can be turned into an algorithm by using the 
test at a 5% significance level. Begin with the best one-parameter model (the 
one with the highest loglikelihood value). Add a second parameter only if the 
two-parameter model with the highest loglikelihood value shows an increase of 
at least 1.92 (so twice the difference exceeds the critical value of 3.84). Then 
move to three-parameter models. If the comparison is to a two-parameter 
model, a 1.92 increase is again needed. If the early comparison led to keeping 
the one-parameter model, an increase of 3.00 is needed (because the test has 
two degrees of freedom). To add three parameters requires a 3.91 increase, 
four parameters a 4.74 increase, and so on. In the spirit of this chapter, this 
algorithm can be used-even for nonspecial cases. However, it would not be 
appropriate to claim that a likelihood ratio test was being conducted. 

Aside from the issue of special cases, the likelihood ratio test has the same 
problem as the other hypothesis tests. Were the sample size to double, the 
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loglikelihoods would also double, making it more likely that a model with 
a higher number of parameters will be selected. This tends to defeat the 
parsimony principle. On the other hand, it could be argued that, if we possess 
a lot of data, we have the right to consider and fit more complex models. A 
method that effects a compromise between these positions is the Schwarz 
Bayesian criterion (SBC) [121], which recommends that when ranking models 
a deduction of (r/2) lnn should be made from the loglikelihood value, where 
r is the number of estimated parameters and n is the sample size. Thus, 
adding a parameter requires an increase of 0.51nn in the loglikelihood. For 
larger sample sizes, a greater increase is needed, but it is not proportional to 
the sample size itself.’ 


Example 13.13 For the continuing example in this chapter, choose between 
the exponential and Weibull models for the data. 


Graphs were constructed in the various examples and exercises. Table 13.14 
summarizes the numerical measures. For the truncated version of Data Set B, 
the SBC is calculated for a sample size of 19, while for the version censored 
at 1,000 there are 20 observations. For both versions of Data Set B, while the 
Weibull offers some improvement, it is not convincing. In particular, neither 
the likelihood ratio test nor the SBC indicates value in the second parameter. 
For Data Set C it is clear that the Weibull model is superior and provides an 
excellent fit. 0 


Example 13.14 In Erample 4.57 an ad hoc method was used to demonstrate 
that the Poisson-ETNB distribution provided a good fit. Use the methods of 
this chapter to determine a good model. i 


The data set is very large and, as a result, requires a very close correspon- 
dence of the model to the data. The results are given in Table 13.15. 

From Table 13.15, it is seen that the negative binomial distribution does 
not fit well while the fit of the Poisson—inverse Gaussian is marginal at best 
(p = 2.88%). The Poisson-inverse Gaussian is a special case (r = —0.5) of 
the Poisson-ETNB. Hence, a likelihood ratio test can be formally applied to 
determine if the additional parameter r is justified. Because the loglikeli- 
hood increases by 5, which is more than 1.92, the three-parameter model is 
a significantly better fit. The chi-square test shows that the Poisson-ETNB 
provides an adequate fit. On the other hand, the SBC favors the Poisson- 
inverse Gaussian distribution. Given the improved fit in the tail for the three 
parameter model, it seems to be the best choice. 0O 


SIn the first edition not only was Schwarz’ name misspelled, but the formula for the penalty 
was incorrect. This edition has the correct version. 

9There are other information-based decision rules. Section 3 of Brockett [17] promotes the 
Akaike information criterion. In a discussion to that paper, Carlin provides support for the 


SBC. 
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Table 13.14 Results for Example 13.13 Table 13.15 Results for Example 13.14 


B truncated at 50 B censored at 1,000- «Fitted distributions —  — 
ae i i i - No. of Observed Negative Poisson— Poisson— 
Criterion Exponential Weibull Exponential Weibull claims frequency binomial inverse Gaussian ETNB 
K-5* 0.1340 0.0887 0.0991 0.0991 0 565,664 565,708.1 565,712.4 565,661.2 
ad 0.4292 0.1631 0.1718 0.1712 1 68,714 68,570.0 68,575.6 68,721.2 
x 1.4034 0.3615 0.5951 0.5947 2 5,177 5,317.2 5,295.9 5,171.7 
p-value 0.8436 0.9481 0.8976 0.7428 3 365 334.9 344.0 362.9 
Loglikelihood —146.063 —145.683 —113.647 —113.647 4 24 18.7 20.8 29.6 
SBC —147.535  —148.628 —115.145  —116.643 5 6 10 12 3.0 
C 6+ 0 0.0 0.1 0.4 
2 61.913 0.3698 Parameters 6 = 0.0350662 A= 0.193804 A= 0.128895 
p-value 10-12 0.9464 r = 3.57784 8 = 0.0712027 B= r 
Loglikelihood —214.924  —202.077 r = —0.846872 
SBC —217.350  —206.929 Chi square 12.13 7.09 0.29 
*K-S and A-D refer to the Kolmogorov-Smirnov and Anderson—Darling — Degrees of freedom 2 2 l 
test statistics, respectively. P- value <1% 2.88% 58.9% 
. —Loglikelihood 251,117 251,114 251,109 
SBC —251,130 —251,127 —251,129 
- Example 13.15 The following example is taken from Douglas [29], p. 253. SSS a ee ee ae a I IU 
An insurance company’s records for one year show the number of accidents : 
per day which resulted in a claim to the insurance company for a particular Table 13.16 Data for Example 13.15 
insurance coverage. The results are in Table 13.16. Determine if a Poisson m > 
model is appropriate. ’ No. of claims/day Observed no. of days 
A Poisson model is fitted to these data. The method of moments and the 0 i 4T 
maximum likelihood method both lead to the estimate of the mean, ; a 
j= = 2.0329. 3 62 
365 4 25 
The results of a chi-square goodness-of-fit test are in Table 13.17. Any time 5 16 
such a table is made, the expected count for the last group is 6 4 
A F i 7 3 
Egy = npk+ = n(1 — po — +++ — Be-1). 8 2 
The last three groups were combined to ensure an expected count of at 9+ 0 


least one for each row. The test statistic is 9.93 with six degrees of free- 
dom. The critical value at a 5% significance level is 12.59 and the p-value is 
0.1277. By this test the Poisson distribution is an acceptable model; however, 
it should be noted that the fit is poorest at the large values, and with the 
model understating the observed values, this may be a risky choice. Oo 


Parameter estimates from fitting four models are in Table 12.13. Various 
fit measures are given in Table 13.18. Only the zero-modified geometric dis- 
tribution passes the goodness-of-fit test. It is also clearly superior according 
to the SBC. A likelihood ratio test against the geometric has a test statistic 
of 2(171,479 — 171,133) = 692, which with one degree of freedom is clearly 
significant. This confirms the qualitative conclusion in Example 12.58. Cl 


Example 13.16 The data set in Table 12.13 come from Beard et al. [12] 
and were previously analyzed in Example 12.58. Determine a model that ad- 
equately describes the data. 
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Table 13.17 Chi-square goodness-of-fit test for Example 13.15 


Claims/day Observed Expected Chi square 
0 AT 47.8 0.01 
1 97 97.2 0.00 
2 109 98.8 1.06 
3 62 66.9 0.36 
4 25 34.0 2.39 
5 16 13.8 0.34 
6 4 4.7 0.10 
` T+ 5 1.8 5.66 


Totals 365 365 9.93 


Table 13.18 Test results for Example 13.16 


Poisson Geometric ZM Poisson ZM geometric 
Chi square 543.0 643.4 64.8 0.58 
Degrees of freedom 2 4 2 2 
p-value i <1% <1% <1% 74.9% 
Loglikelihood —171,373 —171,479 —171,160 —171,133 
SBC —171,379.5 ~171,485.5 —171,173 —171,146 


Example 13.17 The data in Table 13.19, from Simon [122], represent the 
observed number of claims per contract for 298 contracts. Determine an ap- 
propriate model. : 


The Poisson, negative binomial, and Polya—Aeppli distributions are fitted 
to the data. The Polya~Aeppli and the negative binomial are both plausible 
distributions. The p-value of the chi-square statistic and the loglikelihood both 
indicate that the Polya—Aeppli is slightly better than the negative binomial. 
The SBC verifies that both models are superior to the Poisson distribution. 
The ultimate choice may depend on familiarity, prior use, and computational 
convenience of the negative binomial versus the Polya—Aeppli model. O 


Example 13.18 Consider the data in Table 13.20 on automobile liability 
policies in Switzerland taken from Bühlmann [19]. Determine an appropri- 
ate model. 


Three models are considered in Table 13.20. The Poisson distribution is 
a very bad fit. Its tail is far too light compared with the actual experience. 
The negative binomial distribution appears to be much better but cannot be 
accepted because the p-value of the chi-square statistic is very small. The large 
sample size requires a better fit. The Poisson—inverse Gaussian distribution 
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Table 13.19 Fit of Simon data 


Fitted distributions 


Number of Number of Negative 
claims/contract contracts Poisson binomial Polya~Aeppli 
0 99 54.0 95.9 . 98.7 
1 65 92.2 75.8 70.6 
2 57 78.8 50.4 50.2 
3 35 44.9 31.3 32.6 
4 20 19.2 18.8 20.0 
5 10 6.5 11.0 11.7 
6 4 1.9 6.4 6.6 
T 0 0.5 3.7 3.6 
8 3 0.1 2.1 2.0 
9 4 0.0 1.2 1.0 
10 0 0.0 0.7 0.5 
11 1 0.0 0.4 0.3 
12+ 0 0.0 0.5 0.3 
Parameters A = 1.70805 8 = 1.15907 A = 1.10551 
r = 1.47364 g8 = 0.545039 
Chi square 72.64 4.06 2.84 
Degrees of freedom 4 l 5 5 
p-Value <1% 54.05% 72.39% 
Loglikelihood —577.0 —528.8 —528.5 


SBC —579.8 .—534.5 —534.2 


provides an almost perfect fit (p-value is large). Note that the Poisson—inverse 
Gaussian has two parameters, like the negative binomial. The SBC also favors 
this choice. This example shows that the Poisson—inverse Gaussian can have 
a much heavier right-hand tail than the negative binomial. 0 


Example 13.19 Comprehensive medical claims were studied by Bevan [15] in 
1963. Male (955 payments) and female (1,291 payments) claims were studied 
separately. The data appear in Table 13.21 where there was a deductible of 
25. Can a common model be used? 


When using the combined data set the lognormal distribution is the best 
two-parameter model. Its negative loglikelihood (NLL) is 4,580.20. This 
is 19.09 better than the one-parameter inverse exponential model and 0.13 
worse than the three-parameter Burr model. Because none of these models 
is a special case of the other, the likelihood ratio test (LRT) cannot be used, 
but it is clear that using the 1.92 difference as a standard, the lognormal is 
preferred. The SBC requires an improvement of 0.5 In(2,246) = 3.86 and again 
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Table 13.20 Fit of Buhlmann data 


No. of Observed Fitted distributions 


P.-i.G.% 


NIA of WwW NH © 


accidents frequency Poisson Negative binomial 
103,704 102,629.6 103,723.6 103;710.0 
14,075 15,922.0 13,989.9 14,054.7 
1,766 1,235.1 1,857.1 1,784.9 
255 63.9 245.2 254.5 
45 2.5 32.3 40.4 
6 0.1 4.2 6.9 
2 0.0 0.6 1.3 
+ 0 0.0 0.1 0.3 
Parameters A = 0.155140 8 = 0.150232 A = 0.144667 
r = 1.03267 8 = 0.310536 
Chi square 1,332.3 12.12 0.78 
Degrees of freedom 2 2 3 
p-Values <1% <1% 85.5% 
Loglikelihood —55,108.5 —54,615.3 —54,609.8 
SBC i —55,114.3 —54,627.0 —54,621.5 


aP.—i.G. stands for Poisson—inverse Gaussian. 


the lognormal is preferred. The parameters are u = 4.5237 and a = 1.4950. 
When separate lognormal models are fit to males (p = 3.9686 and o = 1.8432) 
and females (u = 4.7713 and o = 1.2848), the respective NLLs are 1,977.25 
and 2,583.82 for a total of 4,561.07. This is an improvement of 19.13 over a 
common lognormal model, which is significant by both the LRT (3.00 needed) 
and SBC (7.72 needed). Sometimes it is useful to be able to use the same 
nonscale parameter in both models. When a common value of ø is used, the 
NLL is 4,579.77, which is significantly worse than using separate models. O 


Example 13.20 In 1958 Longley-Cook [86] examined employment patterns 
of casualty actuaries. One of his tables listed the number of members of the 
Casualty Actuarial Society employed by casualty companies in 1949 (55 ac- 
tuaries) and 1957 (78 actuaries). Using the data in Table 13.22 determine 
a model for the number of actuaries per company which employs at least one 
actuary and find out whether the distribution has changed over the eight-year 
period. 


Because a value of zero is impossible, only zero-trucated distributions 
should be considered. In all three cases (1949 data only, 1957 data only, 
combined data) the ZT logarithmic and ZT (extended) negative binomial dis- 
tributions have acceptable goodness-of-fit test values. The improvement in 
NLL is 0.52, 0.02, and 0.94. The LRT can be applied (except that the ZT 


SELECTING A MODEL 449 


Table 13.21 Comprehensive medical losses for Example 13.19 


Loss Male Female 
25-50 184 199 
50-100 270 310 
100-200 160 262 
200-300 88 163 
300-400 63 103 
400-500 47 69 
500-1,000 61 124 
1,000-2,000 35 40 
2,000-3,000 18 12 
3,000-4,000 13 4 
4,000-5,000 2 1 
5,000-6,667 5 2 
6,667-7,500 3 1 
7,500-10,000 6 1 


Table 13.22 Number of actuaries per company for Example 13.20 


Number of Number of Number of 
actuaries companies—1949 companies—1957 
1 17 23 
2 T 7 
3-4 3 3 
5-9 2 3 
10+ 0 1 


logarithmic distribution is a limiting case of the ZT negative binomial distrib- 
ution with r — 0), and the improvement is not significant in any of the cases. 
The same conclusions apply if the SBC is used. The parameter estimates 
(where £ is the only parameter) are 2.0227, 2.8114, and 2.4479, respectively. 
The NLL for the combined data set is 74.35 while the total for the two separate 
models is 74.15. The improvement is only 0.20, which is not significant (there 
is one degree of freedom). Even though the estimated mean has increased 
from 2.0227/1n(3.0227) = 1.8286 to 2.8114/In(3.8114) = 2.1012, there is not 
enough data to make a convincing case that the true mean has increased. C 


13.5.4 Exercises 


13.20 (*) One thousand policies were sampled and the number of accidents 
for each recorded. The results are in Table 13.23. Without doing any for- 
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Table 13.23 Data for Exercise 13.20 
No. of accidents No. of policies 


100 


aor WNFE © 
bn 
oO 
(oe) 


Total 1,000 


Table 13.24 Results for Exercise 13.23 


Model No. of parameters Negative loglikelihood 
Generalized Pareto 3 219.1 
Burr 3 219.2 
Pareto 2 221.2 
Lognormal 2 221.4 
_ Inverse exponential 1 224.3 


mal tests, determine which of the following five models is most appropriate: 
binomial, Poisson, negative binomial, normal, gamma. 


13.21 For Example 13.1, determine if a transformed gamma model is more 
appropriate than either the exponential model or the Weibull model for each 
of the three data sets. 


13.22 (*) From the data in Exercise 13.11 the maximum likelihood estimates 
are \ = 0.60 for the Poisson distribution and # = 2.9 and Ê = 0.21 for the 
negative binomial distribution. Conduct the likelihood ratio test for choosing 
between these two models. 


13.23 (*) From a sample of size 100, five models are fit with the results given 
in Table 13.24. Use the Schwarz Bayesian criterion to select the best model. 


13.24 This is a continuation of Exercise 12.38. Use both the likelihood ratio 
test (at a 5% significance level) and the Schwarz Bayesian criterion to decide 
if Sylvia’s claim is true. 


13.25 Using the results from Exercises 12.96 and 13.15, use the chi-square 
goodness-of-fit test, the likelihood ratio test, and the Schwarz Bayesian crite- 
rion to determine the best model from the members of the (a, b, 0) class. 
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Table 13.25 Data for Exercise 13.28 


No. of medical claims No. of accidents 


ONO WNHH © 
kej 
Kej 


13.26 Using the results from Exercises 12.98 and 13.16, use the chi-square 
goodness-of-fit test, the likelihood ratio test, and the Schwarz Bayesian crite- 
rion to determine the best model from the members of the (a, b, 0) class. 


13.27 Using the results from Exercises 12.99 and 13.17, use the chi-square 
goodness-of-fit test, the likelihood ratio test, and the Schwarz Bayesian crite- 
rion to determine the best model from the members of the (a, b, 0) class. 


13.28 Table 13.25 gives the number of medical claims per reported automo- 
bile accident. 


(a) Construct a plot similar to Figure 4.8. Does it appear that a mem- 
ber of the (a,6,0) class will provide a good model? If so, which 
one? 


(b) Determine the maximum likelihood estimates of the parameters for 
each member of the (a, b, 0) class. 


(c) Based on the chi-square goodness-of-fit test, the likelihood ratio 
test, and the Schwarz Bayesian criterion, which member of the 
(a,b,0) class provides the best fit? Is this model acceptable? 


13.29 For the four data sets introduced in Exercises 12.96, 12.98, 12.99, 
and 13.28, you have determined the best model from among members of the 
(a, 6,0) class. For each data set determine the maximum likelihood estimates 
of the zero-modified Poisson, geometric, logarithmic, and negative binomial 
distributions. Use the chi-square goodness-of-fit test and likelihood ratio tests 
to determine the best of the eight models considered and state whether or not 
the selected model is acceptable. 


13.30 A frequency model that has not been mentioned to this point is the 
zeta distribution. It is a zero-truncated distribution with p£ = k~ (+) /¢(p+ 
1), k=1,2,...,9 > 0. The denominator is the zeta function, which must be 
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Table 13.26 Data for Excercise 13.32(a) 


No. of claims No. of policies 


96,978 

9,240 

704 

43 

9 

+ 0 


oP WN FE © 


evaluated numerically as ¢(p +1) = Ðg; 7+). The zero-modified zeta 
distribution can be formed in the usual way. More information can be found 
in Luong and Doray [88]. 


(a) Determine the maximum likelihood estimates of the parameters of 
the zero-modified zeta distribution for the data in Example 12.58. 


(b) Is the zero-modified zeta distribution acceptable? 


13.31 In Exercise 13.29 the best model from among the members of the 
(a, b,0) and (a,b, 1) classes was selected for the data sets in Exercises 12.96, 
~- 12.98, 12.99, and 13.28. Fit the Poisson—Poisson, Polya—Aeppli, Poisson- 
inverse Gaussian, and Poisson-ETNB distributions to these data and deter- 
mine if any of these distributions should replace the one selected in Exercise 
13.29. Is the current best model acceptable? 


13.32 The five data sets presented in this problem are all taken from Lemaire 
[82]. For each data set compute the first three moments and then use the ideas 
in Section 4.6.8 to make a guess at an appropriate model from among the 
compound Poisson collection (Poisson, geometric, negative binomial, Poisson— 
binomial (with m = 2 and m = 3), Polya-Aeppli, Neyman Type A, Poisson- 
inverse Gaussian, and Poisson-ETNB). From the selected model (if any) and 
members of the (a, b,0) and (a,b, 1) classes, determine the best model. 


(a) The data in Table 13.26 represent counts from third-party auto- 
mobile liability coverage in Belgium. 


(b) The data in Table 13.27 represent the number of deaths due to 
horse kicks in the Prussian army between 1875 and 1894. The 
counts are the number of deaths in a corps (there were 10 of them) 
in a given year, and thus there are 200 observations. This data set 
is often cited as the inspiration for the Poisson distribution. For 
using any of our models, what additional assumption about the 
data must be made? 


(c) The data in Table 13.28 represent the number of major interna- 
tional wars per year from 1500 through 1931. 
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Table 13.27 Data for Excercise 13.32(b) 


No. of deaths No. of corps 


om wWNmr © 
ie) 


ar 0 


Table 13.28 Data for Excercise 13.32(c) 


No. of wars No. of years 
0 223 
1 142 
2 48 
3 15 
4 4 
5+ 0 


Table 13.29 Data for Excercise 13.32(d) 


No. of runs No. of half innings 


1,023 
222 
87 

32 

18 

11 

6 

+ 3 


NOE WM rH © 


(d) The data in Table 13.29 represent the number of runs scored in 
each half-inning of World Series baseball games played from 1947 
through 1960. 


(e) The data in Table 13.30 represent the number of goals per game 
per team in the 1966-1967 season of the National Hockey League. 


13.33 Verify that the estimates presented in Example 4.64 are the maximum 
likelihood estimates. (Because only two decimals are presented, it is probably 
sufficient to observe that the likelihood function takes on smaller values at 
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Table 13.30 Data for Excercise 13.32(e) 


No. of goals No. of games 
29 


rFOON DO LF WN Fr © 
i 
or 


each of the nearby points.) The negative binomial distribution was fit to 
these data in Example 12.56. Which of these two models is preferable? 


14 


Five examples 


14.1 INTRODUCTION 


In this chapter we present five examples that illustrate many of the concepts 
discussed to this point. The first is a model for the time to death. The second 
model is for the time from when a medical malpractice incident occurs to when 
it is reported. The third model is for the amount of a liability payment. This 
model is also continuous but most likely has a decreasing failure rate (typical 
of payment amount variables). On the other hand, time to event variables 
tend to have an increasing failure rate. The last two examples add aggregate 
loss calculations from Chapter 6 to the mix. 


14.2 TIME TO DEATH 


14.2.1 The data 


A variety of mortality tables are available from the Society of Actuaries at 
www.soa.org. The typical mortality table provides values of the survival func- 
tion at each whole-number age at death. Table 14.1 represents female mor- 
tality in 1900, with only some of the data points presented. It is followed by 


Loss Models: From Data to Decisions, Second Edition. 
By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 
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Table 14.1 1900 female mortality 


z S(x) T S(x) T S(x) 
0 1.000 35 0.681 75 0.233 
1 0.880 40 0.650 80 0.140 
5 0.814 45 0.617 85 0.062 
10 0.796 50 0.580 90 0.020 
15 0.783 55 0.534 95 0.003 
20 0.766 60 0.478 100 0.000 
25 0.739 65 0.410 
30 0.711 70 0.328 


Fig. 14.1 Survival function for Society of Actuaries data. 


Figure 14.1, a graph of the survival function obtained by connecting the given 
points with straight lines. 

The mean residual life function can be obtained by assuming that the 
survival function is indeed a straight line connecting each of the available 
points. From (3.5) it can be computed as the area under the curve beyond 
the given age divided by the value of the survival function at that age. Figure 
14.2 contains a plot of the mean residual life function. The slight increase 
shortly after birth indicates that in 1900 infant mortality was high. Surviving 
the first year after birth adds about five years to one’s expected remaining 
lifetime. After that, the mean residual life steadily decreases, which is the 
effect of aging that we would have expected. 
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Fig. 14.2 Mean residual life function for Society of Actuaries data. 


14.2.2 Some calculations 


Items such as deductibles, limits, and coinsurances are not particularly in- 
teresting with regard to insurances on human lifetimes. We will consider the l 
following two questions: | 


1. For a person age 65, determine the expected present value of providing 
1,000 at the beginning of each year in which the person is alive. The 
interest rate is 6%. 


2. For a person age 20, determine the expected present value of providing 
1,000 at the moment of death. The interest rate is 6%. 


` For the first problem, the present value random variable Y can be written 
as Y = 1,000(Yo +---+Y¥34), where Y; is the present value of that part of the 
benefit that pays 1 at age 65 + if the person is alive at that time. Then, 


1.0673 with probability ST 
Y; = x 
0 with probability 1 = ae 


The answer. is then 


—j 
x00 e S(65 + j) 


BUO 0.410 


= 3,408.07, 


where linear interpolation was used for intermediate values of the survival 
function. 
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For the second problem, let Z = 1,000(1.06~7) be the present value ran- 
dom variable, where T is the time in years to death of the 20-year old. The 
calculation is 
80 1.06—+f(20 + t) 


E(Z) = 1,000 f SCI) 


When linear interpolation is used to obtain the survival function at interme- 
diate ages, the density function becomes the slope. That is, if z is a multiple 
of 5, then 


dt. 


f(t) = S(z) -S15 


Breaking the range of integration into 16 pieces gives 


T <Lt<T+5. 


1000 $Ä S(20 +54) — S(25 +54) [5+5 
5 


E(Z) 1.067*d¢ 
0.766 — i 
15 z x 
200 1.06757 — 1.067-5-53 
= - 20 i} — S(2 le lenient aie Scere a 
0.765 2.15120 + 59) ~ SQ + S71 
= 155.10. 


While it is unusual for a parametric model to be used, we will do so anyway. 
. Consider the Makeham distribution with hazard rate function h(x) = A+B. 


Then 
B(c® — 1) 
Inc : 


Maximum likelihood estimation cannot be used because no sample size is 
given. Because it is unlikely that this model will be effective below age 20, only 
information beyond that age will be used. Assume that that there were 1,000 
lives at age 0 who died according to the survival function in Table 14.1. Then, 
for example, the contribution to the likelihood function for the interval from 
age 30 to 35 is 30In{[S(30) — $(35)]/S(20)} with the survival function using 
the Makeham distribution. The sample size comes from 1,000(0.711 — 0.681) 
with these survival function values taken from the “data.”! The values that 
maximize this likelihood function are A = 0.006698, B = 0.00007976, and 
é = 1.09563. In Figure 14.3 the diamonds represent the “data” and the solid 
curve is the Makeham survival function (both have been conditioned on being 
alive at age 20). The fit is almost too good, suggesting that perhaps this 
mortality table was already smoothed to follow a Makeham distribution at 
adult ages. 

The same calculations can be done. For the annuity, no interpolation is 
needed because the Makeham function provides the survival function values 


S(z) = exp [Az — 


1 Aside from not knowing the sample size, the values in Table 14.1 are probably not random 
observations. It is possible the values in the table were smoothed using techniques of the 
kind discussed in Chapter 15. 
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Fig. 14.3 Comparison of “data” and Makeham model. 


at each age. The answer is 8,405.24. For the insurance, it is difficult to do 
the integral analytically. Linear interpolation was used between integral ages 
to produce an answer of 154.90. The agreement with the answers obtained 
earlier is not surprising. : 


14.2.3 Exercise 


14.1 From ages 5 through 100 the mean residual life function is essentially 
linear. Because insurances are rarely sold under age 5, it would be reasonable 
to extend the graph linearly back to 0. Then a reasonable approximation is 
e(z) = 60 — 0.62. From this, determine the density and survival function for 
the age at death and then use this function to solve the two problems. 


14.3 TIME FROM INCIDENCE TO REPORT 


Consider an insurance contract that provides payment when a certain event 
(such as death, disability, fire) occurs. There are three key dates. The first 
is when the event occurs, the second is when it is reported to the insurance 
company, and the third is when the claim is settled. The time between these 
dates is important because it affects the amount of interest that can be earned 
on the premium prior to paying the claim and because it provides a mecha- 
nism for estimating unreported claims. This example concerns the time from 
incidence to report. The particular example used here is based on a paper by 
Accomando and Weissner [4]. 
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Fig. 14.4 Mean residual life function for report lag data. 


14.3.1 The problem and some data 


This example concerns medical malpractice claims that occurred in a partic- 
ular year. One hundred sixty-eight months after the beginning of the year 
_ under study, there have been 463 claims reported that were known to have 
occurred in that year. The distribution of the times from occurrence to report 
(by month in six month intervals) is given in Table 14.2. A graph of the mean 
residual life function appears in Figure 14.4.” 

Your task is to fit a model to these observations and then use the model to 
estimate the total number of claims that occurred in the year under study. A 
look at the mean residual life function indicates a decreasing pattern and so 
a lighter than exponential tail is expected. A Weibull model can have such a 
tail and so can be used here. 


14.3.2 Analysis 


Using maximum likelihood to estimate the Weibull parameters, the result is 
? = 1.71268 and @ = 67.3002. According to the Weibull distribution, the 
probability that a claim is reported by time 168 is 


F(168) = 1 — e~ 98/8)" 


If N is the unknown total number of claims, the number observed by time 
168 is the result of binomial sampling, and thus on an expected value basis 


? Because of the right truncation of the data, there are some items missing for calculation 
of the mean residual life. It is not clear from the data what the effect will be. This picture 
gives a guide, but the model ultimately selected should both fit the data and be reasonable 
based on the analyst’s experience and judgment. 
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Table 14.2 Medical malpractice report lags 


Lag in months No. of claims Lag in months No. of claims 
0-6 4 84-90 11 
6-12 6 90-96 9 
12-18 8 96-102 7 
18-24 38 102-108 13 
24-30 45 108-114 5 
30-36 : 36 114-120 2 
36-42 62 120-126 7 
42-48 33 126-132 17 
48-54 29 132-138 5 
54-60 24 138-144 8 
60-66 22 144-150 2 
66-72 24 150-156 6 
72-78 21 156-162 2 
78-84 17 162-168 0 
we obtain 


Expected number of reported claims by time 168 = N[1 — e7(168/ D, 
Setting this expectation equal to the observed number reported of 463 and 
then solving for N yields i 


463 
1 — e- (168/97 ` 


Inserting the parameter estimates yields the value 466.88. Thus, after 14 
years, we expect to have about four more claims reported. 

The delta method (Theorem 12.17) can be used to produce a 95% confi- 
dence interval. It is 466.88 + 2.90, indicating that there could reasonably be 
between one and seven additional claims reported. 


N= 


14.4 PAYMENT AMOUNT 


You are the consulting actuary for a reinsurer and have been asked to de- 
termine the expected cost and the risk (as measured by the coefficient of 
variation) for various coverages. To help you out, losses from 200 claims have 
been supplied. The reinsurer also estimates (and you may confidently rely on 
its estimate) that there will be 21 losses per year and the number of losses 
has a Poisson distribution. The coverages it is interested in are full coverage, 
1 million excess of 250,000, and 2 million excess of 500,000. The phrase “z 
excess of y” is to be interpreted as d = y and u = y + z in the notation of 
Theorem 5.13. 
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Table 14.3 Losses up to 200 (thousand) 


Loss range Number of Loss range Number of 
(thousands) losses (thousands) losses 
1-5 3 41-50 19 
6-10 12 51-75 28 
11-15 14 76-100 21 
16-20 9 101-125 15 
21-25 7 126-150 10 
26-30 7 151-200 15 
31-40 18 


Histogram 


0.5 20.5 40.5 60.5 80.5 100.5 120.5 140.5 160.5 180.5 200.5 
Loss 


Fig. 14.5 Histogram of losses. 


14.4.1 The data 


One hundred seventy-eight losses that were 200,000 or below (all expressed in 
whole numbers of thousands of dollars) that were supplied are summarized in 
Table 14.3. In addition, there were 22 losses in excess of 200. They are listed 
below: 


206 219 230 235 241 272 283 286 312 319 385 
427 434 555 562 584 700 711 869 980 999 1506 


Finally, the 178 losses in the table sum to 11,398 and their squares sum to 
1,143,164. 

To get a feel for the data in the table, the histogram in Figure 14.5 was 
constructed. Keep in mind that the height of a histogram bar is the count 
in the cell divided by the sample size (200) and then further divided by the 
interval width. Therefore, the first bar has a height of 3/[200(5)| = 0.003. 
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Table 14.4 Mean residual life for losses above 200 (thousand) 


Loss Mean residual life 
200 314 
300 367 
400 357 
500 330 
600 361 
700 313 
800 289 


900 262 


It can be seen from the histogram that the underlying distribution has a 
nonzero mode. To check the tail, we can compute the empirical mean residual 
life function at a number of values. They are presented in Table 14.4. The 
function appears to be fairly constant and so an exponential model seems 
reasonable. 


14.4.2 The first model 


A two-component spliced model was selected. The empirical model is used 
through 200 (thousand) and an exponential model thereafter. There are (at 
least) two ways to choose the exponential model. One is to restrict the para- 
meter by forcing the distribution to place 11% (22 out of 200) of probability 
at points above 200. The other option is to estimate the exponential model 
independent of the 11% requirement and then multiply the density function 
to make the area above 200 be 0.11. The latter was selected and the re- 
sulting parameter estimate is 0 = 314. For values below 200, the empirical 
distribution places probability 1/200 at each observed value. The resulting 
exponential density function (for z > 200) is 


f(x) = 0.000662344e7*/914, 


For a coverage that pays all losses, the kth moment is (where the 200 losses 
in the sample have been ordered from smallest to largest) 


E(X") = ae ah m z! f(x)dx. 


Then, 
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11,398 


E(X) = -5gg + 0.000662344[314(200) + 3147}e-700/94 = 113.53, 
R(x?) = 1,143,164 
200 
+0.000662344[314(200)? + 2(314)?(200) + 2(314)%]e7 200/314 
= 45,622.93. 


The variance is 45,622.93 — 113.53? = 32,733.87 for a coefficient of variation 
_of 1.59. However, these are for one loss only. The distribution of annual losses 
follows a compound Poisson distribution. The mean is 


E(S) = E(N)E(X) = 21(113.53) = 2,384.13 
and the variance is 


Var(S) = E(N) Var(X) + Var(N)E(X)? 
21(32,733.87) + 21(113.53)? = 958,081.53 


for a coefficient of variation of 0.41. 


For the other coverages we need general formulas for the first two limited 
expected moments. For u > 200, 


E(X Au) 56.99 + ie xf (x)dz + fe uf (x)dz 


u ca 
= §6.99+ el get Hdr 4 ef ue 2/314 da 
200 j 


u 


= 66.99+¢ (—314ze72/814 AAI 3142672/514) 


200 
oo 


+ —cu314e~*/314 


u 


= 56.99+c (161,396e-200/814 = side w/si4) ; 


where c = 0.000662344 and similarly 


u co 
E(X Au)?] = 5,715.82 + c | ge t/34 dy + ef ue Hdg 
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Table 14.5 Limited moment calculations 


u E(X ^u) E[(X A u)?| 
250 84.07 12,397.08 
500 100.24 23,993.47 

1,250 112.31 41,809.37 
2,900 113.51 45,494.83 


Table 14.5 gives the quantities needed to complete the assignment. 
The requested moments for the 1,000 excess of 250 coverage are, for one 
loss, 


Mean = 112.31 — 84.07 = 28.24, 
Second moment = 41,809.37 — 12,397.08 — 2(250) (28.24) 
= 15,292.29, 
Variance = 15,292.29 — 28.24” = 14,494.79, 
VE 4.26. 


Coefficient of variati = 
oefficient of variation 58.94 


It is interesting to note that while, as expected, the coverage limitations reduce 
the variance, the risk, as measured by the coefficient of variation, has increased 
considerably. For a full year, the mean is 593.04, the variance is 321,138.09, 
and the coefficient of variation is 0.96. 

For the 2,000 excess of 500 coverage, we have, for one loss, 


Mean = 113.51 — 100.24 = 13.27, 
45,494.83 — 23,993.47 — 2(500) (13.27) 
8,231.36, 
Variance = 8,231.36 — 13.277 = 8,055.27, 
OSE a 6.76. 

13.27 


Moving further into the tail increases our risk. For one year, the three items 
are 278.67, 172,858.56, and 1.49. 


Second moment 


I] 


Coefficient of variation = 


14.4.3 The second model 


From Figure 14.5, if a single parametric distribution is to be used, one with a 
nonzero mode should be tried. Because the data were rounded to the nearest 
1,000, the intervals should be treated as 0.5-5.5, 5.5-10.5, and so on. After 
considering lognormal, Weibull, gamma, and mixture models (adding an ex- 
ponential distribution), the lognormal distribution is clearly superior (using 
the SBC). The parameters are fi = 4.0626 and G = 1.1466. The chi-square 
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Fig. 14.6 Distribution function plot. 


goodness-of-fit test (placing the observations above 200 into a single group) 
statistic is 7.77 for a p-value of 0.73. Figure 14.6 compares the lognormal 
model to the empirical model. The graph confirms the good fit. 


14.5 AN AGGREGATE LOSS EXAMPLE 


The example to be covered in this section summarizes many of the techniques 
introduced up to this point. The coverage is perhaps more complex than 


those found in practice, but that gives us a chance to work through a variety 
of tasks. 


Example 14.1 You are a consulting actuary and have been retained to assist 
in the pricing of a group hospitalization policy. Your task is to determine 
the expected payment to be made by the insurer. The terms of the policy (per 
covered employee) are as follows: 


1. For each hospitalization of the employee or a member of the employee's 
family, the employee pays the first 500 plus any losses in excess of 


50,500. On any one hospitalization, the insurance will pay at most 
50,000. 


2. In any calendar year, the employee will pay no more than 1,000 in de- 
ductibles, but there is no limit on how much the employee will pay in 
respect of losses exceeding 50,500. 


3. Any particular hospitalization is assigned to the calendar year in which 
the individual entered the hospital. Even if hospitalization extends into 


subsequent years, all payments are made in respect to the policy year 
assigned. 
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Table 14.6 Hospitalizations, per family member, per year 


No. of hospitalizations 


per family member No. of family members 


0 2,659 
1 244 
2 19 
3 2 
4 or more 0 
le a eared i ee 
Total 2,924 


Table 14.7 Number of family members per employee 


No. of family members 


per employee No. of employees 
1 84 
2 140 
3 139 
4 131 
5 73 
6 42 
7 27 
8 or more 33 
Total l 669 


` 4. The premium is the same, regardless of the number of family members. 


Experience studies have provided the data contained in Tables 14.6 and 
14.8. The data in Table 14.7 represent the profile of the current set of em- 
ployees. 


The first step is to fit parametric models to each of the three data sets. For 
the data in Table 14.6, 12 distributions were fitted. The best one-parameter 
distribution is the geometric with a negative loglikelihood (NLL) of 969.251 
and a chi-square goodness-of-fit p-value of 0.5325. The best two-parameter 
model is the zero-modified geometric. The NLL improves to 969.058, but by 
the likelihood ratio test, this is not sufficient to justify the second parameter. 
The best three-parameter distribution is the zero-modified negative binomial, 
which has an NLL of 969.056, again not enough to dislodge the geometric as 
our choice. For the two- and three-parameter models there were not enough 
degrees of freedom to conduct the chi-square test. We choose the geometric 
distribution with 6 = 0.098495. 
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Table 14.8 Losses per hospitalization 


Loss per hospitalization No. of hospitalizations 
0-250 36 
250-500 29 
500-1,000 43 
1,000-1,500 35 
1,500-2,500 39 
2,500-5,000 47 
5,000—10,000 33 
10,000-50,000 24 
50,000- 2 
Total 288 


For the data in Table 14.7, only zero-truncated distributions should be con- 
sidered. The best one-parameter model is the zero-truncated Poisson with an 
NLL of 1,298.725 and a p-value near zero. The two-parameter zero-truncated 
negative binomial has an NLL of 1,292.532, a significant improvement. The 

_ p-value is 0.2571, indicating that this is an acceptable choice. The parameters 
are r = 13.207 and 8 = 0.25884. 

For the data in Table 14.8, 15 continuous distributions were fitted. The 
four best models for a given number of parameters are listed in Table 14.9. It 
should be clear that the best choice is the Pareto distribution. The parameters 
are a = 1.6693 and @ = 3,053.0. 

The remaining calculations were done using the recursive method, but in- 
version or simulation would work equally well. 

The first step is to determine the distribution of payments by the employee 
per family member with regard to the deductible. The frequency distribution 
is the geometric distribution while the individual loss distribution is the Pareto 
distribution, limited to the maximum deductible of 500. That is, any losses 
in excess of 500 are assigned to the value 500. With regard to discretization 
for recursion, the span should divide evenly into 500 and then all probability 


Table 14.9 Four best models for loss per hospitalization 


Saneeeeee 
Name ' No. of parameters NLL p-value 
Inverse exponential 1 632.632 Near 0 
Pareto ; 2 601.642 0.9818 
Burr 3 601.612 0.9476 
Transformed beta 4 601.553 0.8798 
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Table 14.10 Discretized Pareto distribution with 500 limit 


Loss Probability 
0 0.000273 
1 0.000546 
2 0.000546 
3 0.000545 
498 0.000365 
499 0.000365 
500 0.776512 


Table 14.11 Probabilities for aggregate deductibles per family member 


Loss Probability 
0 0.910359 
i 0.000045 
2 0.000045 
3 0.000045 
499 0.000031 
500 0.063386 
501 0.000007 
999 0.000004 
1,000 0.004413 


1,001 0.000001 


not accounted for by the time 500 is reached is placed there. For this example 
a span of 1 was used. The first few and last few values of the discretized 
distribution appear in Table 14.10. After applying the recursive formula, it is 
clear that there is non-zero probability beyond 3,000. However, looking ahead, 
we know that, with regard to the employee aggregate deductible, payments 
beyond 1,000 have no impact. A few of these probabilities appear in Table 
14.11. l l 

We next must obtain the aggregate distribution of deductibles paid per 
employee per year. This is another compound distribution. The frequency 
distribution is the truncated negative binomial and the individual loss dis- 
tribution is the one for losses per family member that was just obtained. 
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Table 14.12 Probabilities for aggregate deductibles per employee 


Loss Probability 
0 0.725517 
1 0.000116 
2 0.000115 
3 0.000115 
499 0.000082 
500 0.164284 
501 0.000047 
999 0.000031 
1,000 0.042343 


Recursions can again be used to obtain this distribution. Because there is a 
1,000 limit on deductibles, all probability to the right of 1,000 will be placed 
at 1,000. Selected values from this aggregate distribution are given in Table 
14.12. Note that the chance that more than 1,000 in deductibles will be paid is 
very small. The cost to the insurer of limiting the insured’s costs is also small 
Using this discrete distribution, it is easy to obtain the mean and standard 
deviation of aggregate deductibles. They are 150.02 and 274.42, respectively. 
We next require the expected value of aggregate costs to the insurer for 
individual losses below the upper limit of 50,000. This can be found ana- 
lytically. The expected payment per loss is E(X A 50,500) = 3,890.87 for 
the Pareto distribution. The expected number of losses per fatnily member 
is the mean of the geometric distribution which is the parameter, 0.098495 
The expected number of family members per employee comes from the ore 
truncated negative binomial distribution and is 3.59015. This implies that 
T o Pee of losses per employee is 0.098495(3.59015) = 0.353612. 
en the expected aggregate dollars in indivi imit i 
dasa aan ee. payments up to the individual limit is 
Then the expected cost to the insurer is the difference 1,375.86 — 150.02 = 
1,225.84. As a final note, it is not possible to use any method other than 
simulation if the goal is to obtain the probability distribution of the insurer’s 
payments. This situation is similar to that of Example 17.7, where it is easy to 
get the overall distribution, as well as the distribution for the insured (in this 
case, if payments for losses over 50,500 are ignored), but not for the insurer. O 
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14.6 ANOTHER AGGREGATE LOSS EXAMPLE 


Careful modeling has revealed that individual losses have the lognormal dis- 
tribution with u = 10.5430 and o = 2.31315. It has also been determined 
that the number of losses has the Poisson distribution with À = 0.0154578. 

© Begin by considering excess of loss reinsurance in which the reinsurance 
pays the excess over a deductible, d, up to a maximum payment u — d, where 
u is the limit established in the primary coverage. There are two approaches 
available to create the distribution of reinsurer payments. The first is to work 
with the distribution of payments per payment. On this basis, the severity 
distribution is mixed, with pdf 


fx(z +d) 


= OO, < —d 
fy(z) 1— Fx(d)’ 0<a<u-—d, 
and discrete probability 
= 1— Fy(u) 
Pr(Y =u- d) = Erd 


This distribution would then be discretized for use with the recursive formula 
or the FFT or approximated by a histogram for use with the Heckman-Meyers 
method. Regardless, the frequency distribution must be adjusted to reflect 
the distribution of the number of payments as opposed to the number of losses. 
The new Poisson parameter will be A[1 — Fx(d)]. ` 


14.6.1 Distribution for a single policy 


We consider the distribution of losses for a single policy for various combina- 
‘tions of d and u. We use the Poisson parameter for the combined group and 
have employed the recursive algorithm with a discretization interval of 10,000 
and the method of rounding. In all cases the 90th and 99th percentiles are 
zero, indicating that most of the time the excess of loss reinsurance will involve 
no payments. This is not surprising because the probability there will be no 
losses is exp(—0.0154578) = 0.985 and with the deductible this probability is 
even higher. The mean, standard deviation, and coefficient of variation for 
various combinations of d and u are given in Table 14.13. 

It is not surprising that the risk (as measured by the coefficient of variation, 
C.V.) increases when either the deductible or the limit is increased. It is also 
clear that the risk of writing one policy is extreme. 


14.6.2 One hundred policies—excess of loss 


We next consider the possibility of reinsuring 100 policies. If we assume that 
the same deductible and limit apply to all of them, the aggregate distribution 
requires only that the frequency be changed. When 100 independent Poisson 
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Table 14.13 Excess of loss reinsurance, one policy 


Deductible (10°) Limit (10°) Mean ea C.V 
V. 

1 778 18,858 24.24 
i 5 2,910 94,574 32.50 
i 10 3,809 144,731 38.00 
ee 25 4,825 229,284 47.52 
oe 50 5,415 306,359 56.58 
i 5 2,132 80,354 37.69 
i w 3,031 132,516 43.72 
< 25 4,046 219,475 54.24 
; 50 4,636 298,101 64.30 
n 10 899 62,556 69.58 
F 25 1,914 162,478 84.89 
a 50 2,504 249,752 99.74 
200 25 1,015 111,054 109.41 
: 50 1,605 205,939 128.71 


Table 14.14 Excess of loss reinsurance, 100 policies 


Deductible Limit Mean 


Standard i 3 

(108) (10°) (103) deviation (103) C.V. s E e 
5 291 946 3.250 708 4,503 
aS 10 381 1,447 3.800 708 9,498 
ri - nae 2,293 4.752 708 11,674 
a 213 804 3.769 190 4,002 
x 10 303 1,325 4.372 190 8,997 
an 25 405 2,195 5.424 190 11,085 

. 10 90 626 6.958 0 4,997 
5.0 25 191 1,625 8.489 0 6,886 
10.0 25 102 1,111 10.941 0 1,854 


random variables are added, the sum has a Poisson distribution with the 
original parameter multiplied by 100. The same process was repeated with 
the revised Poisson parameter. The results appear in Table 14.14. 

As must be the case with independent policies, the mean is 100 times the 
mean for one policy and the standard deviation is 10 times the standard 
deviation for one policy. This implies that the coefficient of variation will be 
one-tenth of its previous value. In all cases, the 99th percentile is now above 
zero. This may make it appear that there is more risk, but in reality it just 
indicates that it is now more likely that a claim will be paid. 
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14.6.3 One hundred policies—aggregate stop-loss 


We now turn to aggregate reinsurance. Assume policies have no individual 
deductible but do have a policy limit of u. There are again 100 policies and 
this time the reinsurer pays all aggregate losses in excess of an aggregate 
deductible of a. For a given limit, the severity distribution is modified as 
before, the Poisson parameter is multiplied by 100, and then some algorithm 
is used to obtain the aggregate distribution. Let this distribution have cdf 
F(s) or, in the case of a discretized distribution (as will be the output from 
the recursive algorithm or the FFT), a pf fs(si) for i = 1,...,n. For a 
deductible of a, the corresponding functions for the reinsurance distribution 


S, are 


Fs,(s) = Fs(s+a), 820, 
fs,(0) = Fs(a)= }_ fs(si); 
si<a 
fs.(ri) = fs(ri +a) Te=S8i—G, 7=1,...,7. 


Moments and percentiles may be determined in the usual manner. 

Using the recursive formula with an interval of 10,000, results for various 
stop-loss deductibles and individual limits are given in Table 14.15. The 
results are similar to those for the excess of loss coverage. For the most part, 
as either the individual limit or the aggregate deductible is increased, the risk, 
as measured by the coefficient of variation, increases. The exception is when 
both the limit and the deductible are 5,000,000. This is a risky setting because 
it is the only one in which two losses are required before the reinsurance will 
take effect. ; 

Now suppose the 100 policies are known to have different Poisson parame- 
ters (but the same severity distribution). Assume 30 have A = 0.0162249 and 
so the number of claims from this subgroup is Poisson with mean 


30(0.0162249) = 0.486747. 


For the second group (50 members) the parameter is 50(0.0174087) = 0.870435 
and for the third group (20 members) it is 20(0.0096121) = 0.192242. There 
are three methods for obtaining the distribution of the sum of the three sep- 


arate aggregate distributions. 


1. Because the sum of independent Poisson random variables is still Pois- 
son, the total number of losses has the Poisson distribution with parame- 
ter 1.549424. The common severity distribution remains the lognormal. 
This reduces to a single compound distribution which can be evaluated 


by any method. 
2. Obtain the three aggregate distributions separately. If the recursive or 


FFT algorithms are used, the result will be three discrete distributions. 
The distribution of their sum can be obtained by using convolutions. 
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Table 14.15 Aggregate stop-loss reinsurance, 100 policies 


Deductible Limit Mean Standard Percentiles (103) 

(108) (10°) (10?) deviation (103) C.V. ~ 90 99 
0.5 5 322 1,003 3.11 863 4,711 
0.5 10 412 1,496 3.63 863 9,504 
0.5 25 513 2,331 4.54 863 11,895 
1.0 5 241 879 3.64 363 4,211 
1.0 10 331 1,389 4.19 363 9,004 
1.0 25 433 2,245 5.19 363 11,395 
2:5 5 114 556 4.86 0 2,711 
2.5 10 204 1,104 5.40 0 7,504 
2.5 25 306 2,013 6.58 0 9,895 
5.0 5 13 181 13.73 0 211 
5.0 10 103 714 6.93 0 5,004 
5.0 25 205 1,690 8.26 0 7,395 


3. If the FFT or Heckman—Meyers algorithms are used, the three trans- 
forms can be found and then multiplied. The inverse transform is then 
taken of the product. 


Each of the methods has advantages and drawbacks. The first method is 
restricted to those frequency distributions for which the sum has a known 
form. If the severity distributions are not identical, it may not be possible to 
combine them to form a single model. The major advantage is that, if it is 
available, this method requires only one aggregate calculation. 

The advantage of method 2 is that there is no restriction on the frequency 
and severity components of the components. The drawback is the expansion of 
computer storage. For example, if the first distribution requires 3,000 points, 
the second one 5,000 points, and the third one 2,000 points (with the same 
discretization interval being used for the three distributions), the combined 
distribution will require 10,000 points. More will be said about this at the 
end of this section. 

The third method also has no restriction on the separate models. It has 
the same drawback as the second method, but here the expansion must be 
done in advance. That is, in the example, all three components must work 
with 10,000 points. There is no way to avoid this. 


14.6.4 Numerical convolutions 


The remaining problem is expansion of the number of points required when 
performing numerical convolutions. The problem arises when the individual 
distributions use a large number of discrete points, to the point where the 
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storage capacity of the computer becomes an obstacle. The following example 
is a small-scale version of the problem and indicates a simple solution. 


Example 14.2 The probability functions for two discrete distributions are 
given below. Suppose the maximum vector allowed by the computer program 
being used is of length 6. Determine an approzimation to the probability func- 
tion for the sum of the two random variables. 


x fi(z) f(z) 
0 0.3 0.4 
2 0.2 0.3 
4 0.2 0.2 
6 0.2 0.1 
8 0.1 0.0 


The maximum possible value for the sum of the two random variables is 14 
and would require a vector of length 8 to store. Usual convolutions produce 
the answer as given below. 


t 0 2 4 6 8 10 12 14 
F(z) 0.12 0.17 0.20 0.21 0.16 0.09 0.04 0.01 


With 6 points available, the span must be increased to 14/5 = 2.8. We then 
do a sort of reverse interpolation, taking the probability at each point that is 
not a multiple of 2.8 and allocating it to the two nearest multiples of 2.8. For 
example, the probability of 0.16 at z = 8 is allocated to the points 5.6 and 8.4. 
Because 8 is 2.4/2.8 of the way from 5.6 to 8.4, six-sevenths of the probability 
is placed at 8.4 and the remaining one-seventh is placed at 5.6. The complete 
allocation process appears in Table 14.16. The probabilities allocated to each 
multiple of 2.8 are then combined to produce the approximation to the true 
distribution of the sum. The approximating distribution is given below. 


x 0 2.8 5.6 8.4 11.2 14.0 
f(z) 0.1686 0.2357 0.2886 0.2057 0.0800 0.0214 


This method preserves both the total probability of one and the mean 
(both the true distribution and the approximating distribution have a mean 
of 5.2). a 


One refinement that can eliminate some of the need for storage is to note 
that when a distribution requires a large vector the probabilities at the end are 
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Table 14.16 Allocation of probabilities for Example 14.2 


x f(z) Lower point Probabiity Upper point Probability 
0 0.12 0 0.1200 : 

2 0.17 0 0.0486 2.8 0.1214 
4 0.20 2.8 0.1143 5.6 0.0857 
6 0.21 5.6 0.1800 8.4 0.0300 
8 0.16 5.6 0.0229 8.4 0.1371 
10 0.09 8.4 0.0386 11.2 0.0514 
12 0.04 11.2 0.0286 14.0 0.0114 
14 0.01 14.0 0.0100 


likely to be very small. When they are multiplied to create the convolution, 
the probabilities at the ends of the new, long vector may be so small that they 
can be ignored. Thus those cells need not be retained and do not add to the 
storage problem. 

Many more refinements are possible. In the appendix to the article by 
Bailey [9] a method which preserves the first three moments is presented. 
He also provides guidance with regard to the elimination or combination of 

-storage locations with exceptionally small probability. 


14.7 COMPREHENSIVE EXERCISES 


The exercises in this section are similar to the examples presented earlier in 
this chapter. They are based on questions that arose in published papers. 


14.2 In New York there were special funds for some infrequent occurrences 
under workers compensation insurance. One was the event of a case being 
reopened. Hipp [57] collected data on the time from an accident to when the 
case was reopened. These covered cases reopened between April 24, 1933 and 
December 31, 1936. The data appear in Table 14.17. Determine a parametric 
model for the time from accident to reopening. By definition, at least seven 
years must elapse before a claim can qualify as a reopening, so the model 
should be conditioned on the time being at least seven years. 


14.3 In the first of two papers by Arthur Bailey [6], written in 1942 and 
1943, he observed on page 51 that “Another field where a knowledge of sam- 
pling distributions could be used to advantage is that of rating procedures 
for deductibles and excess coverages.” In the second paper [7], he presented 
some data (Table 14.18) on the distribution of loss ratios. In that paper he 
made the statement that the popular lognormal model provided a good fit 
and passed the chi-square test. Does it? Is there a better model? 
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Table 14.17 Time to reopening of a workers compensation claim for Exercise 14.2 


Years No. reopened Years No. reopened 
7-8 27 15-16 13 
8-9 43 16-17 9 
9-10 42 17-18 7 
10-11 37 18-19 4 
11-12 25 19-20 4 
12-13 19 20-21 1 
13-14 23 21+ 0 
14-15 10 

Total 264 


Table 14.18 Loss ratio data for Exercise 14.3 


Loss ratio Number 
0.0-0.2 16 
0.2-0.4 27 
0.4-0.6 22 
0.6-0.8 29 
0.8-1.0 19 
1.0-1.5 : 32 
1.5-2.0 10 
2.0-3.0 13 
3.0+ 5 
Total 173 


14.4 In 1979, Hewitt and Lefkowitz [56] looked at automobile bodily injury 
liability data (Table 14.19) and concluded that a two-point mixture of the 
gamma and loggamma distributions [If X has a gamma distribution, then 
Y = exp(X) has the loggamma distribution. Note that its support begins at 
1] was superior to the lognormal. Do you agree? Also consider the gamma 
and loggamma. distributions. 


14.5 A 1980 paper by Patrik [102] contained many of the ideas recommended 
in this text. One of his examples was data supplied by the Insurance Services 
Office on Owners, Landlords, and Tenants bodily injury liability. Policies 
at two different limits were studied. Both were for policy year 1976 with 
losses developed to the end of 1978. The groupings in Table 14.20 have been 
condensed from those in the paper. Can the same model (with or without 
identical parameters) be used for the two limits? 
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Table 14.19 Automobile bodily injury liability losses for Exercise 14.4 


Loss Number Loss Number 


0-50 27 750—-1,000 don CO 


50-100 4 1,000-1,500 16 
100-150 1 1,500-2 ,000 8 
150-200 2 2,000-2 ,500 11 
200-250 3 2,500-3 ,000 6 
250-300 4 3,000-4 ,000 12 
300-400 5 4,000-5 ,000 9 
400-500 6 5,000-7 ,500 14. 
500-750 13 7,500- 40 


Total 189 


Table 14.20 OLT bodily injury liability losses for Exercise 14.5 


Loss (10?) 300 Limit 500 Limit Loss (10°) 300 Limit 500 Limit 
0-0.2 . 10,075 3,977 11-12 56 22 


0.2-0.5 3,049 1,095 12-13 47 23 
< 0.5-1 3,263 1,152 13-14 20 6 
1-2 2,690 991 14-15 151 51 
2-3 1,498 594 15-20 151 54 
3-4 964 339 20-25 109 44 
4-5 794 307 25-50 154 53 
5-6 261 103 50-75 24 14 
6-7 191 79 75-100 19 5 
7-8 406 141 100-200 22 6 
8-9 114 52 200-300 6 9 
9-10 279 89 300-500 10° 3 
10-11 58 23 500- 0 
Totals 24,411 9,232 


“losses for 300-+ 


14.6 The data in Table 14.21 were collected by Fisher [37] on coal mining 
disasters in the United States over 25 years ending about 1910. This particular 
compilation counted the number of disasters per year that claimed the lives of 
five to nine miners. In the article, Fisher claimed that a Poisson distribution 
was a good model. Is it? Is there a better model? 


14.7 Harwayne [49] was curious as to the relationship between driving record 
and number of accidents. His data on California drivers included the number 
of violations. For each of the six data sets represented by each column in 
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Table 14.21 Mining disasters per year for Exercise 14.6 


No. of disasters No. of years No. of disasters No. of years 
0 1 7 3 
1 1 8 1 
2 3 9 0 
3 4 10 1 
4 5 Il 1 
5 2 12 1 
6 2 13+ 0 


Table 14.22 Number of accidents by number of violations for Exercise 14.7 


Number of No. of violations 

Accidents 0 1 2 3 4 5+ 
0 51,365 17,081 6,729 3,098 1,548 1,893 
1 3,997 3,131 1,711 963 570 934 
2 357 353 266 221 138 287 
3 34 4i 44 31 34 66 
: 4 6 6 6 4 14 


T 0 1 1 1 3 1 


Table 14.23 Number of accidents per year for Exercise 14.8 


No. of accidents No. of stretches No. of accidents No. of stretches 


0 9 6 4 
i 65 7 0 
2 57 8 3 
3 35 9 4 
4 20 10 0 
5 10 n 1 


Table 14.22, is a negative binomial distribution appropriate? If so, are the 
same parameters appropriate? Is it reasonable to conclude that the expected 
number of accidents increases with the number of violations? 


14.8 In 1961, Simon [122] proposed using the zero-modified negative binomial 
distribution. His data set was the number of accidents in one year along 
various one-mile stretches of Oregon highway. The data appear in Table 
14.23. Simon claimed that the zero-modified negative binomial distribution 
was superior to the negative binomial. Is he correct? Is there a better model? 


Part V 


Adjusted estimates and 
simulation 


Interpolation and 
smoothing 


15.1 INTRODUCTION 


Methods of model building discussed to this point are based on ideas that came 
primarily from the fields of probability and statistics. Data are considered to 
be observations from a sample space associated with a probability distribution. 
The quantities to be estimated are functions of that probability distribution 
for example, pdf, cdf, hazard rate (force of mortality), mean, variance. 

In contrast, the methods described in this chapter have their origins in 
the field of numerical analysis, without specific considerations of probabilistic 
statistical concepts. 

In practice, many of these numerical methods have been subsequently 
adapted to a probability and statistics framework. Although the key ideas 
of the methods are easy to understand, most of these techniques are com- 
putationally demanding, thus requiring computer programs. The techniques 
described in this chapter are at the lowest end of the complexity scale. 

The objective is to fit a smooth curve through a set of data according to 
some specified criteria. This has many applications in actuarial science as it 
has in many other fields. We begin with a set of distinct points in the plane. In 
practice these points represent a sequence of successive observations of some 
quantity, for example, a series of successive monthly inflation rates, a set of 
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successive average annual claim costs, or a set of successive observed mortality 
rates by age. The methods in this chapter are considered to be nonparametric 
in nature in the sense that the underlying model is not prespecified by a simple 
mathematical function with a small number of parameters. The methods in 
this chapter allow for great flexibility in the shape of the resulting curve. They 
are especially useful in situations where the shape is complex. 

One such example is the curve representing the probabilities of death within 
a short period for humans, such as the function qz. These probabilities de- 
crease sharply at the youngest ages as a result of neonatal deaths, are relatively 
flat until the early teens, rise slowly during the teens, rise and then fall (es- 
pecially for males) during the 18-25 age range (as a result of accidents), then 
continue to rise slowly but at an increasing rate for higher ages. This curve 
is not captured adequately by a simple function (although there are models 
with eight or more parameters available). 

Historically, the process of smoothing a set of observed irregular points 
is called graduation. The set of points typically represents observed rates 
of mortality (probability of death within one year) or rates of some other 
contingency such as disablement, unemployment, or accident. The methods 
described in this chapter are not restricted to these kinds of applications. 
Indeed, they can, be applied to any set of successive points. 

In graduation theory, it is assumed that there is some underlying, but 
- unobservable, true curve or function that is to be estimated or approximated. 
Graduation depends on a trade-off between the high degree of fit that is 
obtained by a “noisy” curve such as a high-degree polynomial that fits the 
data well and the high degree of smoothness that is obtained by a simple 
curve such as a straight line or an exponential curve. 

There are a number of classical methods described in older actuarial text- 
books such as Miller [94]. These include simple graphical methods using an 
engineering draftsman’s French curve or a spline and weights. A French curve 
is a flat piece of wood with a smooth outside edge, with the diameter of the 
outside edge changing gradually. This could be used to draw curves through 
specified points. A spline was a thin rod of flexible metal or plastic that 
was anchored by attaching lead weights called ducks at specified points along 
the rod. By altering the position of the ducks on the rod and moving the 
rod relative to the drafting surface, smooth curves could be drawn through 
successive sets of points. The resulting shape of the rod is the one that mini- 
mizes the energy of deflection subject to the rod passing through the specified 
points. In that sense it is a very natural method for developing the shape of 
a structure so that it has maximal strength. Methods developed by actuaries 
included mathematical methods based on running averages, methods based 
on interpolation, and methods based directly on finding a balance between 
fit and smoothness. All these methods were developed in the early 1900s, 
some even earlier. They were developed using methods of finite differences, in 
which it was frequently assumed that fourth and higher differences should be 
set to zero, implicitly forcing the use of third-degree polynomials. Formulas 
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involving differences were developed so that an actuary could develop smooth 
functions using only pencil and paper. Remember these formulas were devel- 
oped long before calculators (mechanical or electronic!) and very long before 
computers were developed. A more recent summary of these methods along 
with some updated variations can be found in London [84]. 

With the advent of computers in the 1950s and 1960s, many computerized 
mathematical procedures were developed. Among them was the theory of 
splines, this time not mechanical in nature. As with graduation, the objective 
of splines is to find an appropriate balance between fit and smoothness. The 
solutions that were developed were in terms of linear systems of equations 
that could be easily solved on a computer. The modern theory of splines 
dates back to Schoenberg [117]. 

In this chapter, we focus only on the modern techniques of spline interpo- 
lation and smoothing. These techniques are so powerful and flexible that they 
have largely superseded the older methods. 


15.2 POLYNOMIAL INTERPOLATION AND SMOOTHING 


Consider n +1 distinct points labeled (xo, yo), (£1, 91), ---; (En; Yn) With zp < 
T1 < T2 < --- < Tn. A unique polynomial of degree n can be passed through 
these points. This polynomial is called a collocation polynomial and can be 
expressed as 


nm t 
Kosy aa, (15.1) 
j=0 
where 
f(zj)=9j, J=0,1,...,n. (15.2) 
Equations (15.2) form a system of n+1 equations in n +1 unknowns {a;;j = 
0,1,...,n}. However, when n is large, the numerical exercise of solving the 


system of equations may be difficult. 
Fortunately, the solution can be explicitly written without solving the sys- 
tem of equations. The solution is known as Lagrange’s formula: 


= (x — £1)(@ — £2)... (£ — Zp) 
fe) = w (£o — £1) (T0 — Za)... (Zo — Tn) 
(x — Zo)(z — T2)... (£ — Tn) 


HG ag) = ea aa] 
(x — To)(£ — 21)... (£ — Fn—1 
tyn (En — T0) (En — T1). - (En — Tn-1) 


a OE OETA (15.3) 


Tj — To)... (£j — Tj—1)(£j — Tj+1) --- (£j — En 
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Table 15.1 Mortality rates for Example 15.1 


Estimated 

Exposed Actual Mortality Rate 

j Ages to Risk Deaths Per 1,000 
0 25-29 35,700 139 3.89 
1 30-34 244,066 599 2.45 
2 35-39 741,041 1,842 2.49 
3 40-44 1,250,601 4,771 3.81 
4 45-49 1,746,393 11,073 6.34 
5- 50-54 2,067,008 21,693 10.49 
6 55-59 1,983,710 31,612 15.94 
7 60-64 1,484,347 39,948 26.91 
8 65-69 988,980 40,295 40.74 
9 70-74 559,049 33,292 59.55 
10 75-79 241,497 20,773 86.02 
11 80-84 78,229 11,376 145.42 
12 85-89 15,411 2,653 172.15 
13 90-94 2,552 589 230.80 
14 95- 162 44 271.60 

Total 11,438,746 220,699 
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To verify that (15.3) is the collocation polynomial, note that each term is a 
polynomial of degree n and that when x = z; the right-hand side of (15.3) 
takes on value y; for each of j = 0,1,2,...,n. 

The n-degree polynomial f(x) provides interpolation between (xo, yo) and 
(Zn, Yn) and passes through all interior points {(z;,y;);7 = 1,...,n — 1}. 
However, for large n, the function f(x) can exhibit excessive oscillation; that 
is to say, it can be very “wiggly.” This is particularly problematic when there 
is some “noise” in the original series {(x;,y;);j = 0,...,n}. Such noise can 
be caused by measurement error or random fluctuation. 


Example 15.1 The data in Table 15.1 are from Miller [94], p. 62. They 
are observed mortality rates in five-year age groups. The estimated mortality 
rates are obtained as the ratio of the dollars of death claims paid to the total 
dollars exposed to death.’ The rates are plotted in Figure 15.1. 


The estimates of mortality rates at each age are the maximum likelihood 
estimates of the true rates assuming mutually independent binomial models 


‘Deaths and exposures are in units of $1,000. It is common in mortality studies to count 
dollars rather than lives in order to give more weight to the larger policies. The mortality 
rates in the table are the ratios of the given deaths and exposures. The last entry differs 
from Miller’s table due to rounding. 


Mortality rate 


Fig. 15.2 Collocation polynomial for mortality data. 


at each age. Note that there is considerable variability in successive estimates. 
Of course, mortality rates are expected to be relatively smooth from age to 
age. Figure 15.1 shows the observed mortality rates connected by straight lines 
while Figure 15.2 shows a collocation polynomial fitted through the observed 
rates. Notice its wiggly form and its extreme oscillation near the ends. E 


To avoid the excessive oscillatory behavior or wiggliness, lower order poly- 
nomials could be used for interpolation. For example, successive values could 
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be joined by straight lines. However, the successive interpolating lines form a 
jagged series because of the “kinks” at the points of juncture. 

Another method is to piece together a sequence of low-degree polynomials. 
For example, a quadratic function can be collocated with successive points 
at (to, £1, T2), (£2, £3, £4) -... However, there will not be smoothness at the 
points of juncture x2, 24,... in the sense that the interpolating function will 
have kinks at these points with slopes and curvature not matching. One way 
to get rid of the kinks is to force some left-hand and right-hand derivatives 
to be equal at these points. This creates apparent smoothness at the points 
_ of juncture of the successive polynomials. This is the key idea behind splines. 

Interpolating splines are piecewise polynomial functions that pass through the 
given data points but that have the added feature that they are smooth at 
the points of juncture of the successive pieces. The order of the polynomial is 
kept low to minimize “wiggly” behavior. Interpolation using cubic splines is 
introduced in Section 15.3. 

An alternative to interpolation is smoothing, or, more precisely, fitting a 
smooth function to the observed data but not requiring that the function pass 
through each data point. Polynomials allow for great flexibility of shapes. 
However, this flexibility of shape also makes polynomials quite risky to use 
for extrapolation, especially for polynomials of high degree. This was the 
case in Figure 15.2, where the extrapolated values, even for one year, were 
completely unreliable. As with the fitting of other models earlier in this book, 
a fitting criterion needs to be selected in order to fit a model. We will illustrate 
the use of polynomial smoothing by using a least squares criterion. Figures 
15.3-15.6 show the fits of polynomials of degree 2, 3, 4, and 5 to the data of 
Example 15.1. It should be noted that the fit improves with each increase 
in degree because there is one additional degree of freedom in carrying out 
the fit. However, it can be seen that as each degree is added the behavior 
of the extrapolated values for only a few years below age 27 and above age 
97 changes quite significantly. Smoothing splines provide one solution to this 
dilemma. Smoothing splines are just like interpolating splines except that the 
spline is not required to pass through the data points but, rather, should be 
close to the data points. Cubic splines limit the degree of the polynomial to 
3. 


15.2.1 Exercises 


15.1 Determine the equation of the polynomial that interpolates the points 
(2,50), (4,25), and (5, 20). 


15.2 Determine the equation of the straight line that best fits the data of 
Exercise 15.1 using the least squares criterion. 
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Fig. 15.3 Second-degree polynomial fit. 


Mortality rate 


Fig. 15.4 Third-degree polynomial fit. 


15.3 CUBIC SPLINE INTERPOLATION 


Cubic splines are piecewise cubic functions that have the property that the 
first and second derivatives can be forced to be continuous, unlike the approach 
of successive polynomials with jagged points of juncture. 

Cubic splines are used extensively in computer-aided design and manufac- 
turing in creating surfaces that are smooth to the touch and to the eye. The 
cubic spline is fitted to a series of points, called knots, that give the basic 
shape of the object being designed or manufactured. 
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Fig. 15.5 Fourth-degree polynomial fit. 
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Fig. 15.6 Fifth-degree polynomial fit. 


In the terminology of graduation theory as developed by actuaries in the 
early 1900s, cubic spline interpolation is called osculatory interpolation.” 

Programs for cubic splines are included in many mathematical and engi- 
neering software packages. This makes them very easy to apply. 


2 . t : : + 
“The word osculation means “the act of kissing.” Successive cubic polynomials exhibit 
osculatory behavior by “kissing” each other smoothly at the knots! 


CUBIC SPLINE INTERPOLATION 491 


Definition 15.2 Suppose that {(x;,y;);j =0,...,n} are n+1 distinct knots 
with £o < T1 < T2 < --- < Tn. The function f(x) is a cubic spline if there 
exist n cubic polynomials f;(x) with coefficients a;,b;,c;, and dj that satisfy: 


I. f(x) = f(z) = aj+b;(x—r;)+cj(£r—z;)?+d;(x—z;)? for xj Sa < 2541 
and j =0,1,...,n—1. 


IL f(zj)=yj, j =0,1,...,n. 

II. f,(%j41) = fpaa(@yqa), J =0,1,2,....n—2. 
IV. Filzi) = fji) J =0,1,2,...,.n-2. 
V. f(a) = Falja), 7 =0,1,2,...,.n-2. 


Property I states that f(x) consists of piecewise cubics. Property I states 
that the piecewise cubics pass through the given set of data points. Property 
III requires the spline to be continuous at the interior data points. Properties 
IV and V provide smoothness at the interior data points by forcing the first 
and second derivatives to be continuous. 


15.3.1 Construction of cubic splines 


Each cubic polynomial has four unknown constants: aj, bj, Cj, and dj. Because 
there are n such cubics, there are 4n coefficients to be determined. Properties 
ILV provide n+1,n—1,n—1, and n—1 conditions, respectively, for a total of 
ån — 2 conditions. In order to determine the 4n coefficients, we need exactly 
two more conditions. This can be done by adding two endpoint constraints 
involving some of f'(x), f”(x), or f” (x) at zo and £n. Different choices 
of endpoint constraints lead to different results. Various possible endpoint 
constraints are discussed in the next section. 

In order to construct the cubic segments in the successive intervals, first 
consider the second derivative f; (x). It is a linear function because f;(x) is 
cubic. Therefore, the Lagrangian representation of the second derivatives is 


Hp) = f(a p E + f'a) HL. 15.4 
HOS Pe ETE + fend (15.4) 
To simplify notation, let m; = f” (x;) and hj = £j+1 — Tj, so that 
Mej M74 
fi (2) = g Cm — 2) + (25) (15.5) 
j j 


for zj < x < 244, and j = 0,1,...,n — 1. 
Integrating this twice leads to 


Mit. 


Bh, (x —2,)? +p; (2741-2) +4;(—2;), (15.6) 
j 


f;(2) = ge Cm S 
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where p; and q; are undetermined constants of integration. [To check this, 
just differentiate (15.6) twice] 
Substituting z; and 2;+1 into (15.6) yields 


ee 
Uj = BENG + Pihi (15.7) 
and 
m . 
Y+ = ae hj + ayhy (15.8) 


because fj(xj) = yj and fj (2341) = y;41- 
_ We now obtain the constants p; and q; from (15.7) and (15.8). When they 
are substituted into (15.6), we obtain 

MII 


6h; 
pike 
+(#- 4) (ein =a) 


Yiri — Misthj\ p. 
+(# 6 )e zj). 


fiz) = ay m -= z)? + (z — 2;)° 


(15.9) 


Note that the m; = f”(x;) terms are still unknown. To obtain them, differ- 
. entiate (15.9), 


Mj Mj+1 
fi(z) ~~ (r; — 2)? +(e — r)? 
j 2h; It 2h; =g 
Yj mta) Yr _ Mj+ihj 
-2 gp 3° _--4. 15.10 
& 6 hy oe 
Now setting z = z; yields, after simplification, 
T DG aA 
j 
Replacing j by j — 1 in (15.10) and setting £ = zj, 
Tis mos ETTA 
fiales) = hia + thy + A, (15.12) 
hj—ı 


Now, Property IV forces the slopes to be equal at each knot. This requires us 
to equate the right-hand sides of (15.11) and (15.12), yielding the following 
relation between successive values mj—1, mj and m5+1: 


hy—-1mj—1 +2(hj—1 +hy)mj+hymj41 =6 Cn — u=) (15.13) 
l hj hj-1 
for j =1,2,...,n—1. 
The system of equations (15.13) consists of n—1 equations in n-+1 unknowns 
™o,™1,--.;Mn. Two endpoint constraints can be added to determine mo 
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and Mn. Obtaining the n — 1 remaining unknowns in (15.13) then allows 
for complete determination of the cubic (15.9) for j = 0,1,2,...,n—1 and 
therefore the entire cubic spline. 

For purpose of notational simplicity, we can rewrite (15.13) as 


hj-imj—1 + gym; + Agmja1 = Uj, j =1,2,...,.2- 1, (15.14) 
where 
uj = 6 ( HELT YE) and gj = 2(hj-1 + hy). (15.15) 
hj hj-1 


When the endpoints mp and Mn are determined externally, the system 
(15.14) can be rewritten in matrix notation as 


g h 0 pis 0 mı uz — homo 
hi g hoe 0 0 me uz 
0 ho g3 hg 0 0 
0 : 2 
hn—3 0 ; 
hn—-3 gn-2 hn-2 Mn—2 Un—2 
0 0 tee 0 hn-2 gn—i Mr-L Un—1 — hn—1Mn 
(n—1)x(n—1) (n—1)x1 (n—1)x1 
(15.16) 
Or as 
Hm = v. (15.17) 


The matrix H is tridiagonal and invertible. Thus the system (15.17) has a 
unique solution m = H~1v. Alternatively, the system can be solved manually 
using Gaussian elimination. 
_ Once the values m1, M2,..., Mn—ı are determined, the values of cj are 
determined by 
Mj P 
ga, jHl,....n-1. 


2 
Property II specifies that 
aj = Yj, J =0,...,.n-1. 
Property V specifies that 
mj + 6djhj = Mj+1, J =0,...,n— 2, 
yielding 
_ Tijsi Mj 


dj = =, 5 =0.... 


29. 
6h; H 


Property III specifies that 


aj +bjhj + cih? + dh? = yj+1, j =0,... n — 2. 
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Substituting for aj, cj, and d; yields 


p, = vita Ys _ hilmi + miy) j 

1 hj 6 ? 
Summarizing, the spline coefficients for the first n — 1 spline segments are 
computed as 


=0,...,n— 2. 


ay = Yj, 
e Yj yj _ hilmi + mj) 
? 
hy 6 
dj; = py Pa On (15.18) 


The only remaining issue in order to obtain the cubic spline is the choice of 
the two endpoint constraints. There are several possible choices. Once the 
two endpoint constraints are selected, the n cubics are fully specified. Thus 
the values of bn—1, Cn—1, and dn—ı can also be obtained using (15.18). 


Case 1: Natural Cubic Spline (mp = Mn = 0) 

The natural spline is obtained by setting mp and Mp to zero in (15.16). 
Because mp and m, are the second derivatives at the endpoints, the choice of 
zero minimizes the oscillatory behavior at both ends. It also makes the spline 
linear beyond the boundary knots, a property that minimizes oscillatory be- 
havior beyond both ends of the data. This is probably safest for extrapolation 
beyond the data points in most applications. Note that the second-derivative 
endpoint constraints do not in themselves restrict the slopes at the endpoints. 


Case 2: Curvature-Adjusted Cubic Spline (mp and mp fixed) 

It is similarly possible to fix the endpoint second derivatives mg and Mn to 
prespecified values f” (xo) and f(z), respectively. Then (15.16) can again be 
used directly to obtain the values of m1, M2,- .-, Mn—1. However, in practice, 
this is difficult to do without some judgment. It is suggested that the natural 
spline is a good place to start. If more curvature at the ends is wanted, it can 
be added using this procedure. 


Other endpoint constraints may be a bit more complicated and may require 
modification of the first and last of the system of equations (15.14), which will 
result in changes in the matrix H and the vector v in (15.17). 


Case 3: Parabolic Runout Spline (mo = m1, Mn = Mn-1) 
Reducing the cubic functions on the first and last intervals to quadratics 
adds two more constraints, do = 0 and dn = 0. This results in the second 
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derivatives being identical at both ends of the first and last intervals; that is, 
Mo = Mı and Mn = Mn_1. As a result, the first and last equations of (15.14) 
are replaced by 


(3ho + 2hi)m, + hime 
hn=2Mn—2 + (2hn—2 + 3hn—1)Mn-1 = Un-1- (15.19) 


U1, 


Case 4: Cubic Runout Spline 

This method requires the cubic over [£o, xı] to be an extension of that 
over [1,22], thus imposing the same cubic function over the entire interval 
[z9,£2|. This is also known as the not-a-knot condition. A similar condition 
is imposed at the other end. 

This can be achieved by requiring that the third derivatives at the endpoints 
also agree at x; and rp_1; that is, 


fo (21) = fi"(#1) 


and 
fa—o(@n—1) = fai (@n-1)- 


Because the third derivative is then constant throughout [zo, £2] and also 
throughout [Tn—2, Zp], the second derivative will be a linear function through- 
out the same two intervals. Hence, the slope of the second derivative will be 
the same in any subintervals within [zo, £2] and within [£n—2, £n]. Thus, we 
can write 


my, — mo = mg — My 
ho > h ~~ « 
Mn —™Mn—-1 _ Mn-1 — Mn—2 
hn-1 g hn—2 f 


or equivalently 


mo = m,— ? 


hn—1(™Mn—-1 — Mn-2) 


Mn = Mn- t haa 
pi 


(15.20) 


Then the first and last equations of (15.14) are replaced by 


l h2 h2 
3ho + 2hı + —2 mı + hı -2 mə = Ul, 
hy hy 


h2 
=) Mn—2 + (2n + 3An—1 + a) Mn—1 = Un—1- 
n—2 


“oN 
= 
: 3 
| 
i) 
| 
> 
ite 
5] 


(15.21) 
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Case 5: Clamped Cubic Spline 
This procedure fixes the slope f§(zo) and f},_4(tn) of the spline at each 
endpoint. In this case, from (15.11) and (15.12), the second derivatives-are 


TEE 


3 1 Yn — Yn—ı Mn—1 
Mn ha (s(n) haai ; D 


As a result the first and last equations of (15.14) are replaced by 


mo = 


(15.22) 


(3ho + 2h1) my + him = uy — 3 (= — filz) ) 


and 


hn-2Mn-2 + (2hn-2 + 3hn—1) Mn-1 = Un-1 — 3 (faaea) = ma m) 
a 
respectively. 


Example 15.3 From first principles, using conditions I-V, obtain the cubic 
spline through the points (2, 50), (4, Pi and (5,20) with the clamped boundary 
conditions f'(2) = —25 and f'(5) = 


Let the cubic spline in the interval from zo = 2 to zı = 4 be the polynomial 
folz) = 50 + bo(x — 2) + co(x — 2)? + do(z — 2)? 

and the spline in the interval from zı = 4 to z2 = 5 be the polynomial 
fr(z) = 25 + br(a — 4) + c (z — 4)? + di (£ — 4)°. 


The six coefficients bo, Co, do, b1, C1, dı are the unknowns that we need to de- 
termine. From the interpolation conditions 


fo(4) 
fi (5) 


From the smoothness conditions at x = 4 


50 + 2bo + 4co + 8do = 25, 
25 + bı + cı + dy = 20. 


FiA) = bo + 2co(4 — 2) + 3do(4 — 2)? = fi (4) = b, 
o (4) 2co + 6do(4 — 2) = fi'(4) = 261. 


Finally, from the boundary conditions, we get 


fo(2) bo = —25 
fi (5) = 6,+2cq4+3d, = — 


ll 
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Thus, we have six linear equations to determine the six unknowns. In matrix 


form, the equations are 


0 bo —25 
1 Co —5 
Ot dy. 12 0 
0 b | 0 
0 Cl —25 
3 dy —4 


The equations can be solved by successive elimination of unknowns. We get 


24 8 0 
00 0 1 
1 4 12 -1 
02 12 0 
10 0 0 
00 O 1 
b 
bo = —25, then 
4 8 0 
0 0 1 
4 12 —1 
2 12 0 
0 0 1 
Take co = 6.25 — 2dp, then 
0 1 1 
4 -1 0 
8 0 -2 
0 1 2 


Take dọ = 0.2501, then 


1 1 1 
2 —2 0 
1 2 3 


Take bı = —6.25 + cı, then 


a 


C1 
dy 


Co 25 
do —5 
b | =] 25 
Ci 0 
dy —4 
do —5 
b 0 
Cy 5 —12.5 
dy —4 


|= [235]; 


Finally, take cı = 0.625 — 0.5d, and get dı = 0.25. The final answer is 


bo 
co 
do 
bı 
C1 
dy 


—25 
9.125 
—1.4875 
—5.75 
0.5 

0.25. 


Thus the final interpolating cubic spline is 
50 — 25(x — 2) + 9.125(a — 2)? — 1.4375(x —2)8, 2<2<4, 


p= { 25 — 5.75( — 4) +0.5(e — 4)? +0.25(£ — 4), 4 <a <5. 
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Fig. 15.7 Clamped and natural splines for Example 15.3. 


Figure 15.7 shows the interpolating cubic spline and the corresponding 
- natural cubic spline that is the solution of Exercise 15.3. It also shows the 
function h(x) = 100/z, which also passes through the same three knots. The 
slope of the clamped spline at the endpoints is the same as the slope of the 
function h(x). These endpoint conditions force the clamped spline to be much 
closer to the function h(x) than the natural spline. The natural spline has 
endpoint conditions that force the spline to look more like a straight line near 
the ends due to requiring the second derivative to be zero at the endpoints. O 


The cubic splines in this section all pass through the knots. If smoothing 
is desired, that restriction may be lifted. Smoothing splines are introduced in 
Section 15.6. 


Example 15.4 The data in the last column of Table 15.1 are one-year mor- 
tality rates for the 15 five-year age intervals shown in the first column. The 
last interval is treated as 95-99. We have used a natural cubic spline to in- 
terpolate between these values as follows. The listed mortality rate is treated 
as the one-year mortality rate for the middle age within the five-year interval. 
The resulting values are treated as knots for a natural cubic spline. The fit- 
ted interpolating cubic spline is shown in Figure 15.8 on a logarithmic scale. 
The formula for the spline is given in Property I of Definition 15.2. The 
coefficients of the 14 cubic segments of the spline are given in Table 15.2. 


WONDAKRWNre OC]. 
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Spline coefficients for Example 15.4 


Table 15.2 

Gj bj 
3.8936x1073 -3.5093x1074 
2.4543x1073 -1.6171x1074 
2.4857x107° 1.5307x1074 
3.8150x1073 3.6587x1074 
6.3405x1073 6.9632 1074 
1.0495 x 107? 8.5678x 10-4 
1.5936x1072 1.6337x107? 
2.6913x1072 —-2.4590x 107 
4.0744x 107? 3.4150x 1073 
5.9551x10-2. 3.4638x107 
8.6018x10-2 —-9.8939x 1073 
1.4542x1071 8.4813 10-3 
1.7215x107+ 7.8602 1073 
2.3080 1071 1.1306 1072 


Cj 

0 
3.7844x1075 
2.5112x1075 
1.7450 1075 
4.8640x 1075 
-1.6550x1075 
1.7194x 1074 
-6.8828x 10-6 
1.9808 1074 
-1.8833x1074 
1.4744 107 
-1.7569x 1073 
1.6327 1078 
-9.4349x 1074 


Fig. 15.8 Cubic spline fit to mortality data for Example 15.4. 


dj 


2.5230x107° 
-8.4886 x 1077 
-5.1079x1077 
2.0794x107ê 
-4.3460x 107° 
1.2566x1075 
-1.1922x 1075 
1.3664 1075 
-2.5761x1075 
1.1085x 1074 
-2.15421074 
2.2597 x 1074 
-1.7174x1074 
6.2899x 1075 


15.3.2 Exercises 


15.3 Repeat Example 15.3 for the natural cubic spline by removing the 
clamped spline boundary conditions. 


15.4 Construct a natural cubic spline through the points (—2,0), (—1,1), 
(0,0), (1,1), and (2,0) by setting up the system of equations (15.16). 


15.5 Determine if the following functions can be cubic splines: 
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(a) 
T; —4<zr<0, 
f(z)= 4 «+2, O<e<l, 
3r? -Qea+1, 1<r<9. 
(b) 
T3, 0O<r<1, 
f(z) =< 32? —3r+1, 1<zr<2, 
z? — 4r? + 13r — 11, 2<g<4. 
(c) 
z? +27, -l<2<0, 
F(z) =< Qa? + 2a, 0<¢ <1, 


T? — xr? +5r—1, 1<2<3. 


15.6 Determine the coefficients a,b, and c so that 


fay={ ore 0<2<1, 
a+ W(x —1)+c(a—1)?+4(2—-1)3, 1<2<3. 


is a cubic spline. 


_ 15.7 Determine the clamped cubic spline that agrees with sin(zr/2) at z = 
~1,0,1. 


15.8 Consider the function 


28 + 252 + 92? + 23, —-3<2<-1, 
Or 26 + 192 + 32? — 23, -l<2<0, 
26 + 192 + 32? — 223, 0<2<3, 


—163 + 2082 — 60x? + 523, 3<2<4, 


(a) Prove that f(z) can be a cubic spline. 
(b) Determine which of the five endpoint conditions could have been 


used in developing this spline. 
15.4 APPROXIMATING FUNCTIONS WITH SPLINES 
The natural and clamped cubic splines have a particularly desirable property 


when the spline is considered to be an approximation to some other continuous 
function. For example, consider the function 


1 
A(z) ==, 2<nK<5. 


This function collocates with the knots at z = 2,4,5 in Example 15.3. Let 
us suppose the knots had indeed come from this function. Then, we could 
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consider the interpolating cubic spline to be an approximation to the function i 
h(x). In many applications, such as computer graphics, where smooth images 
are needed, those smooth images can be represented very efficiently using a 
limited number of selected knots and a cubic spline interpolation algorithm. 

Smoothness can be measured by the total curvature of a function. The 
most popular of such measures is the squared norm 


Tn 
S= f Lf’ (2) dz, (15.23) l 
To : 
representing the total squared second derivative. 
Now consider any continuous function h(x) that also has continuous first 
and second derivatives over some interval [£o, £n]. Suppose that we select 
n — 1 interior knots {z;, h(x;) r with £o < T1 < T2 < +- < Tn- 
Let f(x) be a cubic spline that collocates with these knots and has endpoint 
conditions either 


F'(£0) = h' (z0) and f!(tn) = h'(tn) (clamped spline) 
or 
f” (zo) =O and f”(zn)=0. (natural spline). 


The natural or clamped cubic spline f(x) has less total curvature than any 
other function h(x) passing through the n+ 1 knots, as shown in the following 
theorem. i 


Theorem 15.5 Let f(x) be the natural or clamped cubic spline passing through 
the n+1 given knots. Let h(x) be any function with continuous first and sec- 
ond derivatives that passes through the same knots. Also, for the clamped 
cubic spline assume h' (xo) = f' (zo) and h' (£n) = f'(£n). Then 
f Lf" (a) [2a < [hl (a) Pde. (15.24) 
TO 


to 


Proof: Let us form the difference D(x) = h(x) — f(x). Then, D’(z) = 
h(x) — f” (x) and therefore 


[h" (x)? = [F" (2)? + (D(a)? +28" (£) D” (a). 


Integrating both sides produces 


I "wape f "UF! (@) Pde + | "ID" (x) Pde +2 f EAA 
To To To To 
The result will be proven if we can show that 


/ "HD (x)de = 0 
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because the total curvature of the function h(z), 


JT MEPs, 
Zo 
will be equal to the total curvature of the spline, 
En 
OE 
To 
plus a nonnegative quantity 
f [D" (x)]?dz. 
To 


Applying integration by parts, we get 


I  F'(@)D"(æ)ds = f"(2)D'(a) 


ae if T p(n) D(a) de. 


For the clamped cubic spline, the first term is zero because the clamped 
boundary conditions imply that 


D'(z0) 
D' (£n) 


h'(ao) — f'(zo) =0, 
h'(tn) — f’(tn) =0. 


For the natural cubic spline f”(z9) = f” (£n) = 0, which also makes the first 
term zero. 


The integral in the second term can be divided into subintervals as follows: 


Ln n—1 D541 
i f"(2)D'(2)dz = X` f f” (x)D'(z)dz. 
zo j=043i 


lI 


Integration by parts in each subinterval yields 


f ~ f(@)D! (ade = f"(e) Da) E — 1 1H D ()D(a)de. 


Tj 


The first term is zero because of the interpolation condition 


D(z;) = h(z;) — f(z;)=0, j =0,1,...,n. 


That is, we are only considering functions h(x) that pass through the knots. 
The second term is zero because the spline f(x) in each subinterval is a 


cubic polynomial and has zero fourth derivative. Thus, for the clamped or 
natural cubic spline, 


f ” p(n)" («)de =0, 
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which proves the result. Oo 


Thus the clamped cubic spline has great appeal if we want to produce a 
smooth set of successive values and if we have some knowledge of the slope of 
the function at each end of the interval. This is often the case in mortality 
table construction. At very early ages in the first few days and weeks of 
life, the force of mortality or hazard rate decreases sharply as a result of 
deaths of newborn lives with congenital and other conditions that contribute 
to neonatal deaths. At the highest ages, the force of mortality tends to flatten 
out at a level of between 0.3 and 0.4 at ages well over 100. Using a clamped 
cubic spline to graduate observed rates will result in obtaining the smoothest 
possible function that incorporates the desired properties at each end of the 
age spectrum. If the mortality data are only over some more limited age range 
(as is usually the case with life insurance or annuity data), either natural or 
clamped cubic splines can be used. Including a clamping condition controls 
the slope at the endpoints. 


Example 15.6 For the clamped cubic spline obtained in Example 15.3 calcu- 
late the value of the squared norm measure of curvature. Calculate the same 
quantity for the function h(x) = 100/x which also passes through the given 
knots. 


The spline function is 


Fle) = { 50- BG — 2) + 9.125(2 — 2)? —1.4375(«& — 2}, 2<2<4, 
T) =) 95 —5.75(x — 4) +0.5(z — 4)? +0.25(x — 4}, 4<r<5, 


and the second derivative is 


f'(a) = { 18:25 — 8.625( — 2) = 35.5 - 8.6252, 2<a<4, 
=) 141.5(e—4) =152—5 4<an<5. 


The total curvature of the spline is 


[urea 


4 5 
i (35.5 — 8.6252)? dz + f (1.52 — 5)°dzr 
2 4 


18.25 5 1 2.5 5 1 
eae, —d 
| Y 3625 y+ [ Y T5” 


3 118.25 3 2.5 


iv Lie 2_| = 238.125. 
25.875, "45h ioa 


ll 


For h(x), the second derivative is h” (x) = 2007? and the curvature is 


5 
f 40,0002~® da 
2 


= -8,000275|° 
= 247.44. 


5 
Fl (2002~3)? dz 
2 
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Notice how close the total curvature of the function h(x) and the clarhped 
spline are. Now look at Figure 15.7, which plots both functions. They are very 
similar in shape. Hence we would expect them to have similar curvature. Of 
course, as a result of Theorem 15.5, the curvature of the spline should be less, 
though in this case it is only slightly less. In Exercise 15.9 you are asked to 
calculate the total curvature of the corresponding natural spline (which also 
appears in Figure 15.7). Because it is much “straighter,” you would expect its 
total curvature to be significantly less, which is confirmed in Exercise 15.9.0 


15.4.1 Exercise 


15.9 For the natural cubic spline obtained in Exercise 15.3 calculate the value 
of the squared norm measure of curvature. 


15.5 EXTRAPOLATING WITH SPLINES 


In many applications we may want to produce a model that can be faithful to a 
set of historical data but that can also be used to forecast into the future. For 
example, in detérmining liabilities of an insurer when future claim payments 
. are subject to inflationary growth, the actuary may need to project the rate 
of future claims inflation for some 5 to 10 years into the future. One way to 
do this is by fitting a function, in this case a cubic spline, to historic claims 
inflation data. 

Simply projecting the cubic in the last interval beyond £n may result in 
excessive oscillatory behavior in the region beyond £n. This could result in 
projected values that are wildly unreasonable. It makes much more sense 
to require projected values to form a simple pattern. In particular, a linear 
projection is likely to be reasonable in most practical situations. This is easily 
handled by cubic splines. 

The natural cubic spline has endpoint conditions that require the second 
derivatives to be zero at the endpoints. The natural extrapolation is linear 
with the slope coming from the endpoints. Of course, the linear extrapola- 
tion function can be done for any spline using the first derivative at the end 
points. However, unless the second derivative is zero, as with the natural 
spline, the second derivative condition will be violated at the endpoints. The 
extrapolated values at each end are then 


F(z) 
F(z) 


Il 


(2n) + f'(fn)(z@ — £n), £> £n, 
f (xo) — f’(xo)(to — £), £ < z0. 


Example 15.7 Obtain formulas for the extrapolated values for the clamped 
spline in Example 15.3 and determine the extrapolated values at £ = 0 and 
t=7. 
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In the first interval, f(x) = 50 — 25(a — 2) + 9.125(a — 2)? — 1.4875(a — 2)8 
and so f(2) = 50 and f’(2) = —25. Then, for x < 2, the extrapolation 
is f(z) = 50 — (—25)(2 — x) = 100 — 25z. In the final interval, f(z) = 
25 — 5.75(x — 4) +0.5(x — 4)? + 0.25(2 — 4)? and so f(5) = 20 and f’(5) = —4. 
Then, for z > 5, the extrapolation is f(x) = 20—4(x—5) = 40—42. At x =0 
the extrapolated value is 100 —25(0) = 100 and at x = 7 it is 40—4(7) = 12.0 


15.5.1 Exercise 


15.10 Obtain formulas for the extrapolated values for the natural spline in 
Exercise 15.3 and determine the extrapolated values at z = 0 and z= 7. 


15.6 SMOOTHING SPLINES 


In many actuarial applications, it may be desirable to do more than interpolate 
between observed data. If data include a random (or “noise”) element, it is 
often best to allow the cubic spline or other smooth function to lie near the 
data points, rather than requiring the function to pass through each data 
point. aA 

In the terminology of graduation theory as developed by actuaries in the 
early 1900s, this is called modified osculatory interpolation. The term 
modified is added to recognize that the points of intersection (or knots in the 
language of splines) are modified from the original data points. i 

The technical development of smoothing cubic splines is identical to inter- 
polating cubic splines except that the original knots at each data point (£4, yi) 
are replaced by knots at (x;,a;) where the ordinate a; is the constant term 
in the smoothing cubic spline 


f;(x) =a; + b; (x — Tj) + ea — zj) + d(x _ z)’. (15.25) 


We first imagine that the ordinates of original data points are the outcomes 
of the model 


yj = g(z;) + €j, 
where ej, j = 0,1,...,n, are independently distributed random variables with 
mean 0 and variance o? and where g(x) is a well-behaved function.’ 


Example 15.8 Mortality rates q; at each age j are estimated by the ratio of 
observed deaths to the number of life-years of exposure D;/n;, where Dj; is 
a binomial (n;,q;) random variable. The estimator q; = dj/n;, where dj is 


3 Without specifying what “well-behaved” means in technical terms, we are simply trying 
to say that g(x) is smooth in a general way. Typically we will require at least the first two 
derivatives to be continuous. ; 
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the observed number of deaths, has variance o? = q;(1—4;)/nj, which can- be 
estimated by qj (1 — dj) /n;- 


We attempt to find a smooth function f(z), in this case a cubic spline, 
that will serve as an approximation to the “true” function g(x). Because g(x) 
is assumed to be well behaved, we will require the smoothing cubic spline 
f(x) itself to be as smooth as possible. On the other hand, we want it to be 
faithful to the given data as much as possible. These are conflicting objectives. 
Therefore, a compromise will be necessary between fit and smoothness. 

The degree of fit can be measured using the chi-square criterion 


F= (e maa) (15.26) 


j=0 


This is a standard statistical criterion for measuring the degree of fit and was 
discussed in that context in Section 13.4.3. It has a chi-square distribution 
with n + 1 degrees of freedom.‘ 

The degree of smoothness can be measured by the overall smoothness of 
the cubic spline. The smoothness, or equivalently the total curvature, can be 
measured by the squared norm smoothness criterion 


s= | É [f"(2) Pan. 


It was shown in Theorem 15.5 that within the broad class of functions with 
continuous first and second derivatives, the natural or clamped cubic spline 
minimizes the squared norm. This supports the choice of the cubic spline as 
the smoothing function. 

In order to recognize the conflicting objectives of fit and smoothness, we 
construct a criterion which is a weighted average of the measures of fit and 
smoothness. Let 


L = pF+(1—p)S 
a Yj — aj +1 -— “a 2dr. 
ne = sy ( oft (2) Pde 


The parameter p reflects the relative importance which we give to the 
conflicting objectives of remaining close to the data, on the one hand, and of 
obtaining a smooth curve, on the other hand. Notice that a linear function 


satisfies the equation 
S= [OPa = 
zo 


iNo degree of freedom is lost, because unlike with the goodness-of-fit test, if you know all 
but one of the terms of the sum, it is not possible to infer the remaining value. 


| 
| 
| 


SMOOTHING SPLINES 507 


which suggests that, in the limiting case, where p = 0 and thus smoothness is 
all that matters, the spline function f(x) will become a straight line. At the 
other extreme, where p = 1 and thus the closeness of the spline to the data is 
all that matters, we will obtain an interpolating spline which passes exactly 
through the data points. 

The spline is piecewise cubic and thus the smoothness criterion can be 


written 
S= y f" (2) Pdz = =| } (a) Pde. 
j=0 
From (15.5), 
f(z) = Fe (tin —2)+ TH (2 -2;), 
and then 


Tj+1 Tj+i z A 2 
[uoa = [U Hen- ea) ae 
es Tj hj hy 
fed x 3 
L 
= [ [rng(1 — y) + myy]? hy dy 
0 


1 
= hj f [m; + (mj+ı — m,;)y]? dy 
0 


1 


sag ft 
3 3(mj41 — mM) 


0 


h; 
= prj + mjm + mj), 


where the substitution y = (x—z;)/h; is used in the second line. The criterion 
function then becomes 


n-1 


n 2 
: — QA; h; 
L=p%. YTI) 4 (1-p) NO 2 (mj + mymjy1 + m3) 
joS I S 


We need to minimize this function with respect to the 2n + 2 unknown 
quantities {a;, Mmj; j = 0,...,n}. Note that when we have solved for these 
variables we will have four pieces of information {a}, @j+1, Mj, Mj+1} for each 
interval [z;,2j+1]. This allows us to fully specify the interpolating cubic spline 
in each interval. We now address the issue of solving for these quantities. 

We now consider the natural smoothing spline. The equations developed 
for interpolating splines apply to smoothing cubic splines except that the 
y,’s are replaced by a,;’s to recognize that the abscissas {a;;j = 0,...,n} of 
the smoothing splines do not pass through the abscissas of the data points 
{y;;j =0,...,n}. From (15.16), we can write 


Hm = u, 
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where m = (m1, M2, ..., Mn—1)? and u = (u1,U2,...,Un—1)? because mp = 
Mn = 0 from the natural spline condition. From (15.14), the vector u can be 
rewritten as $ 


u = Ra, 
where R is the (n — 1) x (n +1) matrix 


Ta —(To +11) Ty O° ees ies 0 
aa K Tı rae +7T2) T2 0 mat 0 
0 wih Dii O tr-2 —(Tn-2 +Tn-1) Tn-1 
and 
a = Govt, 0n)* | i= 6h;?. 


Then we have 
Hm = Ra. (15.27) 


We can now rewrite the criterion L as 
L=ply—a)’="'(y—a)+4(1—p)m?7Hm 

_ where X = diag{oG,07,...,07}- Because m = H~'Ra, we can rewrite the 

criterion as 


L= p(y — a)" E} (y — a) + ¿(1 — pja RTH Ra. 


We can differentiate the criterion L with respect to each of ag,a1,...,@n 
successively to obtain the optimal values of the ordinates. In matrix notation, 
the result is (after dividing the derivative by 2) 

—p(y —a)T + #(1—p)a’R7H'R=0, 


where 0 is the (n + 1) x 1 vector of zeros, (0,...,0)7. This yields, after 
transposition, 
6p=1(y — a) = (1—p)R7H“'Ra 
or 
6p=—(y — a) = (1—p)R™m. (15.28) 
We now premultiply by RX, yielding 


6pREX (y — a) = (1 — pp RERTm 


or 
6p(Ry — Ra) = (1 — p) RERT m. (15.29) 
Because Hm = Ra, this reduces to 


pRy — pHm = 3(1—p)RUR™m 
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or 
(pH pii- p)RERT) m = pRy. (15.30) 


This is a system of n—1 equations in n—1 unknowns. The system of equations 
can be solved for m1,™m2,...,/Mn—1. Using matrix methods, the solution can 
be obtained from (15.30) as 


-1 
m= (x $ + Prue’) Ry. (15.31) 
Now, the values of ao, a1,-..,@n can be obtained by rewriting (15.28) as 
a=y- LEER m. (15.32) 


Finally, substitution of (15.31) into (15.32) results in 
1—p 1—p "= 
=y-——i>R? | H+ ——RERT) Ry. 15.33 
azy- ( ote ) y. (15.33) 


Thus we have obtained the values of the intercepts of the n cubic spline 
segments of the smoothing spline. The values of the other coefficients of the 
spline segments can now be calculated in the same way as for the natural 
interpolating spline, as discussed in Section 15.3 using the knots {(xj,a,), 
j = 0,...,n} and setting mp = Mn = 0. It should be noted that the only 
additional calculation for the natural smoothing spline as compared with the 
natural interpolation spline is given by (15.33). 

The magnitude of the values of the criteria for fit F and smoothness S 
may be very different. Therefore one should not place any significance on the 
specific choice of the value of p (unless it is 0 or 1). Smaller values of p result ` 
in more smoothing; larger values result in less. In some applications it may 
be necessary to make the value of p very small, for example, 0.001, to begin 
to get visual images with any significant amount of smoothing. This is, in 
part, due to the role of the variances which appear in the denominator of the 
fit criterion. Small variances can result in the fit term being much larger than 
the smoothness term. Therefore, it may be necessary to have a very small 
value for p to get any visible smoothing. 


Example 15.9 Construct natural cubic smoothing splines for the data in Ta- 
ble 15.1. The natural cubic interpolating spline through the mortality rates was 
shown in Figure 15.8. 


Natural cubic smoothing splines with p = 0.5 and p = 0.1 are shown in 
Figures 15.9 and 15.10. The coefficients for the smoothing spline with p = 0.1 
are given in Table 15.3. Note that the resulting splines look much like the 
one in Figure 15.8 except near the upper end of the data where there are 
relatively fewer actual deaths and less smoothness in the successive observed 
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In(mortality rate) 
a A 


b 


| 


25 45 65 85 


Fig. 15.9 Smoothing spline with p = 0.5 for Example 15.9. 


values. Also observe the increased smoothness in the spline in Figure 15.10 
resulting from the smaller emphasis on fit. The standard deviations were 
calculated as in Example 15.8 with the resulting values multiplied by 1,000 to 
make the numbers more reasonable.® O 


Example 15.9 illustrates how the smoothing splines can be used to carry out 
both interpolation and smoothing automatically. The knots at quinquennial 
ages were smoothed using (15.32). The modified knots were then used as knots 
for an interpolating spline. The interpolated values are the revised mortality 
rates at the intermediate ages. The smoothing effect was not visually dramatic 
in Example 15.9 because the original data series was already quite smooth: 
The next example illustrates how successive values in a very noisy series can 
be smoothed dramatically using a smoothing spline. 


Example 15.10 Table 15.4 gives successive observed mortality rates for a 
15-year period. The data can be found in Miller [94], p. 11 and are shown 
in Figure 15.11. Fit smoothing splines changing p until smoothing appears 
reasonable and provide values of the revised mortality rates at each age. 


Unlike Example 15.9, the numbers represent numbers of persons, not dollar 
amounts, and can be used directly in the estimates of the variances of the 


5Had the values not been multiplied by 1,000 the same answers could have been obtained 
by altering the value of p. This method of calculating the standard deviations does not 
consider the possible variation in sizes of the insurance policies. See Klugman [75] for a 
more detailed treatment. The method used here implicitly treats all policies as being of 
the same size. That size is not important because as with the factor of 1,000, a constant of 
proportionality can be absorbed into p. Í 
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Fig. 15.10 Smoothing spline with p = 0.1 for Example 15.9. 


Table 15.3 Spline coefficients for Example 15.9 with p = 0.1 


Tj Qj bj Cj dj 


27 3.8790x1073 -3.4670x1074 0  2.4846x107° 
32 2.4560x1073 -1.6036x10-* 3.7269x1075 -8.0257x1077 
37 2.4856x1073 =: 11.5214x107* 2.5230x1078 -5.0019x1077 
42 3.8146x1073 = 3.6693x107* 1.7728x107ë 1.9945x107ê 
6.3417x1073 6.9379x1074 4.7644x107ë -4.0893x10-° 
52 1.0491x1072? 8.6353x1074 = -1.3695x10-5 1.1833x10-~ë 
57  1.5945x107? 1.6141x1073 1.6380x1074 -9.7024x107° 
62 2.6898x107? 2.5244x1073 1.8268x107ë = 6.2633x107® 
67 4.0759x1072? 3.1769x1078 1.1222x107¢ = -9.4435 1077 
72  5.9331x1072 = 4.2282x10-8 9.8052x1075 3.9737x10~ë 
10 77 8.7891x1072? 8.1890x1073 6.9411x1074 -6.6804x1075 
11 82 1.3784x1071 1.0120x107? -3.0794x1074 2.1572x107* 
12 87 = 1.8344x1071 8.6583x1073 1.5633x1075 — -1.0282x1076 
13 92 2.2699x10-! 8.7376x1078 =. 2.1021x10-* = -1.4014x 1078 


WONonunrwne O[S. 
ih 
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mortality rates (see Example 15.8). The standard deviations are multiplied by 
a factor of 10 for convenience. For insurance purposes, we are more interested 
in the spline values at the knots, that is, the aj. The interpolated values are 
given in Table 15.4 and the spline values are plotted in Figures 15.12-15.14 
for p = 0.5, 0.1,0.05. Note that for p = 0.5 there is significant smoothing but 
that some points still have a lot of influence on the result. For example the 
large number of actual deaths at age 76 causes the curve to be pulled upward. 
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Table 15.4 Mortality rates and interpolated values for Example 15.10 


j Age Exposed Observed Estimated Smoothed 
Tj to risk deaths mort. rate p=0.5 p=0.1 p= 0.05 R 
0 70 135 6 0.044 0.046 0.050 0.052 z 
1 71 143 12 0.084 0.078 0.069 0.065 = 
2 72 140 10 0.071 0.077 0.071 0.069 $ 
3 73 144 11 0.076 0.066 0.064 0.065 = 
4 74 149 6 0.040 0.049 0.062 0.066 
5 75 154 16 0.104 0.100 0.084 0.080 
6 76 150 24 0.160 0.126 0.096 0.089 
7 77 139 8 0.058 0.076 0.087 0.088 
8 78 145 16 0.110 0.091 0.091 0.093 Age 
9 79 140 13 0.093 0.102 0.105 0.107 
10 80 137 19 0.139 0.131 0.128 0.128 
11 81 136 21 0.154 0.157 0.155 0.154 Fig. 15.12 Smoothing spline for mortality data with p = 0.5. 
12 82 126 23 0.183 0.182 0.181 0.181 
13 83 126 26 0.206 0.208 0.209 0.208 
14 84 109 26 0.239 0.238 0.237 0.236 
- Total 2,073 237 


More smoothing can be obtained by reducing p, as can be observed from the 
three figures. a 


Mortality rate 


Fig. 15.13 Smoothing spline for mortality data with p = 0.1. 


Mortality rate 
o 
a 


Example 15.10 demonstrated the smoothing capability of splines. However, 
one still needs to choose a value of p. In practice, this is done using professional 
judgment and visual inspection. If, as with Example 15.9, data sets are large 
and there is already some degree of smoothness in the observed data, then 
a fitted curve which closely follows the data is likely highly desirable. If 
the available data set is more limited, as with Example 15.10, considerable 
smoothing is needed and judgment plays a large role. For data sets of any 
size, formal tests of fit can be conducted. The fit criterion F' has a chi-square 
distribution with n + 1 degrees of freedom. This can provide some guidance. 


0 r — 


70 75 80 85 
Age 


Fig. 15.11 Mortality data for Example 15.10. 
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Mortality rate 


Age 


Fig. 15.14 Smoothing spline for mortality data with p = 0.05. 


The choice of other tests of fit such as the runs test can be employed to identify 
specific anomalies of the fitted spline. 


15.6.1 Exercise 


15.11 Consider the natural cubic smoothing spline that smooths the points 
(0,0), (1,2), (2,1), @, 3) using p = 0.9 and standard deviations of 0.5. (Use 
a spreadsheet for the calculations.) 

(a) Obtain the values of the intercepts of the nodes by using (15.33). 


(b) Obtain the natural cubic smoothing spline as the natural interpo- 
lating spline through the nodes using (15.16) and (15.18). 


(c) Graph the resulting spline from x = —0.5 to x = 2.5. 


Credibility 


16.1 INTRODUCTION 


Credibility theory is a set of quantitative tools which allows an insurer to 
perform prospective experience rating (adjust future premiums based on past 
experience) on a risk or group of risks. If the experience of a policyholder is 
consistently better than that assumed in the underlying manual rate (some- 
times called the pure premium), then the policyholder may demand a rate 
reduction. 

The policyholder’s argument is as follows: The manual rate is designed 
to reflect the expected experience of the entire rating class and implicitly 
assumes that the risks are homogeneous. However, no rating system is perfect, 
and there always remains some heterogeneity in the risk levels after all the 
underwriting criteria are accounted for. Consequently, some policyholders will 
be better risks than that assumed in the underlying manual rate. Of course, 
the same logic dictates that a rate increase should be applied to a poor risk, 
but the policyholder in this situation is certainly not going to ask for a rate 
increase! Nevertheless, an increase may be necessary, due to considerations of 
equity and’ the economics of the situation. 

The insurer is then forced to answer the following question: How much of 
the difference in experience of a given policyholder is due to random variation 


Loss Models: From Data to Decisions, Second Edition. 
By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 
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in the underlying claims experience and how much is due to the fact that the 
policyholder really is a better or worse risk than average for the given rating 
class? In other words, how credible is the policyholder’s own experience? ‘Two 
facts must be considered in this regard: 


1. The more past information the insurer has on a given policyholder, the 
more credible the policyholder’s own experience, all else being equal. 
In the same vein, in group insurance the experience of larger groups is 
more credible than that of smaller groups. 


2. Competitive considerations may force the insurer to give full (using the 
past experience of the policyholder only and not the manual rate) or 
nearly full credibility to a given policyholder in order to retain the busi- 
ness. 


Another use for credibility is in the setting of rates for classification systems. 
For example, in workers compensation insurance there may be hundreds of 
occupational classes, some of which may provide very little data. In order 
to accurately estimate the expected cost for insuring these classes, it may 
be appropriate to combine the limited actual experience with some other 
information, such as past rates, or the experience of occupations that are 
closely related. 

From a statistical perspective, credibility theory leads to a result that would 
appear to be counterintuitive. If experience from an insured or group of 
insureds is available, our statistical training may convince us to use the sample 
mean or some other unbiased estimator. But credibility theory tells us that it 
is optimal to give only partial weight to this experience and give the remaining 
weight to an estimator produced from other information. We will discover that 
what we sacrifice in terms of bias we gain in terms of reducing the average 
(squared) error. 

Credibility theory allows an insurer to quantitatively formulate the above 
problem, and this chapter provides an introduction to this theory. A few 
relevant statistical concepts are reviewed in the next section. Some topics 
were covered in Sections 9.2 and 12.4 are repeated and there are some new 
formulas and concepts as well. 

Section 16.3 deals with limited fluctuation credibility theory, a sub- 
ject developed in the early part of the twentieth century. This provides a 
mechanism for assigning full (Section 16.3.1) or partial (Section 16.3.2) cred- 
ibility to a policyholder’s experience. The difficulty with this approach is the 
lack of a sound underlying mathematical theory justifying the use of these 
methods. Nevertheless, this approach provided the original treatment of the 
subject and is still in use today. 

A classic paper by Btihlmann in 1967 [18] provided a statistical framework 
within which credibility theory has developed and flourished. While this ap- 
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proach, termed greatest accuracy credibility theory,! was formalized by 
Biihlmann, the basic ideas were around for some time. This approach is 
introduced in Section 16.4. The simplest model, that of Biihlmann [18], is 
discussed in Section 16.4.4. Practical improvements were made by Biihlmann 
and Straub in 1970 [20]. Their model is discussed in Section 16.4.5. The 
concept of exact credibility is presented in Section 16.4.6. 

. Practical use of the theory requires that unknown model parameters be esti- 
mated from data. Nonparametric estimation (where the problem is somewhat 
model free and the parameters are generic, such as the mean and variance) 
is considered in Section 16.5.1, semiparametric estimation (where some of the 
parameters are based on assuming particular distributions) in Section 16.5.2, 
and finally the fully parametric situation (where all parameters come from 
assumed distributions) in Section 16.5.3. 

We close with a quote from Arthur Bailey in 1950 [8], p. 8, that aptly 
summarizes much of the history of credibility. We, too, must tip our hats 
to the early actuaries, who, with unsophisticated mathematical tools at their 
disposal, were able to come up with formulas that not only worked but also 
were very similar to those we carefully develop in this chapter. 


It is at this point in the discussion that the ordinary individual has to 
admit that, while there seems to be some hazy logic behind the actu- 
aries’ contentions, it is too obscure for him to understand. The trained 
statistician cries “Absurd! Directly contrary to any of the accepted the- 
ories of statistical estimation.” The actuaries themselves have to admit 
that they have gone beyond anything that has been proven mathemat- 
ically, that all of the values involved are still selected on the basis of 
judgment, and that the only demonstration they can make is that, in 
actual practice, it works. Let us not forget, however, that they have 
made this demonstration many times. It does work! 


16.2 STATISTICAL CONCEPTS 


In this section various statistical concepts relevant to credibility theory are 
presented. Much of the material is of a review nature and hence may be 
quickly glossed over by a reader with a good background in statistics. Nev- 
ertheless, there may be some material which may not have been seen before, 
and so this section should not be completely ignored. Subsequent sections 
will refer back to this material. 


1 The terms limited fluctuation and greatest accuracy go back at least as far as a 1943 paper 
by Arthur Bailey [7]. 
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16.2.1 Conditional distributions 


Suppose that X and Y are two random variables with joint probability func- 
tion (pf) or probability density function (pdf)? fx,y (x,y) and marginal pfs 
fx(z) and fy(y), respectively. The conditional pf of X given that Y = y is 


Fre (el) = Pew, (16.1) 


` If X and Y are discrete random variables, then (16.1) is the conditional prob- 
ability of the event X = x under the hypothesis that Y = y. If X and Y are 


continuous, then (16.1) may be interpreted as a definition. When X and Y 
are independent random variables, 


Ixy (z,y) = fx (2) fy (y), 


and in this case, (16.1) yields 


Fx (aly) = fx(2),. 


We observe that the conditional and marginal distributions of X are identical. 


Example 16.1 Suppose X and Z are independent Poisson random variables 
with means A, and ào, respectively. Let Y = X + Z. Demonstrate that 
X|Y = y is binomial with parameters m = y and q = 1/(A1 + 2) (see, for 
example, [58], p. 131). 


The conditional distribution of X given that Y = y is 


? When it is unclear, or when the random variable may be continuous, discrete, or a mixture 
of the two, the term probability function and abbreviation pf will be used. The term 
probability density function and the abbreviation pdf will be used only when the random 
variable is known to be continuous. 
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favan) = SEN 
Pr(X =2,Y =y) 
Pr(Y =y) 
Pr(X =2,Z =y- 1) 
Pr(Y =y) 
Pr(X = x)Pr(Z = y — 2) 
Pr(Y =y) 
Mme Ee 
z! (y—=)! 
(Ai eis do) e7A1—A2 
y! 


y! Ài S Ag Ve 
zl(y — x)! At + A2 Ai + A2 


for z = 0,1,2,--- ,y. This is a binomial distribution with parameters m = y 


_ and q = Aq/ (A +g). : O 


Note that (16.1) may be rewritten as 


fay(2,y) = fx (ly) f(y), (16.2) 


demonstrating that joint distributions may be constructed from products of 
conditional and marginal distributions. Because the marginal distribution of 
X may be obtained by integrating (or summing) y out of the joint distribution, 


fx(z) = J ixx, y) dy, 


we find using (16.2) that 


hoe / fx (oly) fr (u) dy. (16.3) 


Formula (16.3) has an interesting interpretation as a mixed distribution (see 
Section 4.4.5). To see this, assume that the conditional distribution fx y (zly) 
is one of the usual parametric distributions where y is the realization of a 
random parameter Y with distribution fy(y). In Section 4.6.3 it was shown 
that if, given © = 6, X has a Poisson distribution with mean ĝ and © has 
a gamma distribution, then the marginal distribution of X will be negative 
binomial. Also, Example 4.30 showed that, if X|O has a normal distribution 
with mean © and variance v and © has a normal distribution with mean p 
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and variance a, then the marginal distribution of X is normal with mean L 
and variance a -+ v. 


Note that the roles of X and Y in (16.2) can be interchanged, yielding 


Fxiy (tly) fy (y) = fyix (ulz) fx (z), 


because both sides of this equation equal the joint distribution of X and Y. 
Division by fy (y) yields Bayes’ theorem, namely, 


Fyix (ylz) fx (2) 


Fxty (zly) = fr(y) 


16.2.2 Conditional expectation 


As in the previous subsection, assume that X and Y are two random variables 
and the conditional pf of X given that Y = y is fxiv(zly). Clearly, this is a 
valid probability distribution, and its mean is denoted by 


E(XIY =y) = f © Fay (oly) de (16.4) 


with the integral replaced by a sum in the discrete case. Clearly, (16.4) is a 
function of y, and it is often of interest to view this conditional expectation 
as a random variable obtained by replacing y by Y in the right-hand side of 
(16.4). Thus we can write E(X|Y) instead of the left-hand side of (16.4), and 
so E(X|Y) is itself a random variable because it is a function of the random 
variable Y. The expectation of E(X|Y) is given by 


E[E(X|Y)] = E(X). (16.5) 
To see this, note that from (16.3) and (16.4) 
BEY) = | BEY =y) d 
J [ixw eidfa 
fef favo) frw) aves 


frix) dz 
E(X) 


Il 


Il 


with a similar proof in the discrete case. 


Example 16.2 Derive the mean of the negative binomial distribution by con- 
ditional expectation, recalling that, if X|O ~ Poisson(©) and © ~ gamma(a, P), 
then X ~ negative binomial with r =a and B = b. 
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We have 
E(X|0)=0 


and so 
E(X) = E[E(X|9)] = E(@). 


From Appendix A the mean of the gamma distribution of © is a, and so 
E(X) = af. o 


It is often convenient to replace X by an arbitrary function h(X,Y) in 
(16.4), yielding the more general definition 


BK YY =y] = | hæ v)fxiy (el) dz. 


Similarly, E(h(X,Y)|Y] is the conditional expectation viewed as a random 
variable which is a function of Y. Then, (16.5) generalizes to 


E{E([A(X, Y)|Y]} = E[A(X,Y)]. (16.6) 
To see (16.6), note that 


Il 


B{EIA(X, YY} J Bf (X,Y) |¥ = y) fr (4) dy 
J I hlæ, v) fxiv (ely) dx Fr (v) dy 
J I h(x, y)Lfxry (elu) fy (y)] dz dy 


J [re y) Fx,y (z, y) dz dy 
E[A(X, Y)] 


Il 


Il 


Il 


from (16.2). 

If we choose h(X,Y) = [X — E(X|Y)]?, then its expected value, based on 
the conditional distribution of X given Y, is the variance of this conditional 
distribution, 

Var (X|Y) = E{[X — E(X|Y)]?|Y}. (16.7) 


Clearly, (16.7) is still a function of the random variable Y. 
It is instructive now to analyze the variance of X where X and Y are two 
random variables. To begin, note that (16.7) may be written as 


Var(X|Y) = E(X?|Y) — [E(X|Y)]?. 
Thus, 
E[Var(X|¥)] = E{E(X?|Y) — [E(X|Y)}?} 
E[E(X?|Y)] — E{[E(X|Y)}?} 
= E(X?) — E{[E(X|Y))’}. 


Il 
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Also, because Var[h(Y)] = E{[h(Y)]?} — {B[A(Y)]}2, we may use h(Y) = 


E(X|Y) to obtain 


Var[E(X|Y)] = E{[E(X|Y))?} — {EE(XIY] P 
E{[E(X|Y))?} — (E(x). 


Thus, 


E[Var(X|¥)] + Var[E(X|Y)] E(X?) — E{(E(X|Y)}?} 
+E{[E(X|Y)]}?} — (E(x)? 
E(X?) — [E(x)? 


E Var(X). 


Thus, we have established the important formula 

Var(X) = E[Var(X|Y)] + Var[E(X|Y)]. (16.8) 
Formula (16.8) states that the variance of X is composed of the sum of two 
parts: the mean of the conditional variance plus the variance of the conditional 
mean. 
Example 16.3 Derive the variance of the negative binomial distribution. 

The Poisson distribution has equal mean and variance, that is, 
E(X|0) = Var(X|O) = O, 

and so, from (16.8), 


Var(X) 


Il 


_ E[Var(X|©)] + Var[E(X|0)] 
E(@) + Var(6). 


Because © itself has a gamma distribution with parameters a and l, E(O) = 


aß and Var(@) = a6". Thus the variance of the negative binomial distribution 
is 


Var(X) 


E(Q) + Var(©) 
ab + a8? 
ap(1 + 28). g 


i 


Example 16.4 It was shown in Example 4.30 that, if X|© is normally dis- 
tributed with mean © and variance v where © is itself normally distributed 
with mean js and variance a, then X (unconditionally) is normally distributed 
with mean p and variance a +v. Use (16.5) and (16.8) to obtain the mean 
and variance of X directly. 


STATISTICAL CONCEPTS 523 


For the mean we have 
E(X) = E[E(X]©)] = E(9) = 4 
and for the variance we obtain 


Var(X) = E[Var(X|0)] + Var[E(X|9)] 
E(v) + Var(@) 
= uta 


because v is a constant. m 
Example 16.5 Consider a compound Poisson distribution with Poisson mean 
A, where X =Y; +---+ Yn with E(Y;) = py and Var(Y;) = of. Determine 
the mean and variance of X. 


Formula (16.8) was used in Chapter 6 to obtain the answers: 


E(X) = Apy and Var(X) = \(u? + 02). 


16.2.3 Nonparametric unbiased estimators 


Unbiased estimation was covered in Section 9.2.2. It plays an important role 
in the development of credibility formulas. We begin by showing that two 
commonly used estimators are unbiased. 

Theorem 16.6 If Xj,...,Xn are independent but not necessarily identically 


distributed with common mean p = E(X;) and common variance v = Var(X;), 
then 


ô= ates YZ- (16.9) 


is an unbiased estimator of v. 


Proof: For X, we have 


E(Ž) =E (25%) = EYE; =p. 


j=1 
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For the variance estimator, begin with the following result, which will be used 
later: 


Sa -P = SX) -u+u-XP 


j=l j=l 
= wee — py +25 (X; - w)(u- X) + Sou - X/)? 
j=1 j=l j=l 
= Sa — 2)? + 2(u — X) X (X; - p) + nu- 8) 
j=1 j=1 


= 1% — p)? + 2(u— X)n(X — u) + n(u - X)? 


= E-a n-a. (16.10) 


We also have (from the independence of Xj,...,Xn), 


Var(X) = Var (35%) 


Take expectations in (16.10) to obtain 


E So z zy! E So - J - nb = 1)?| 
ja j=l 


SY El(X; — 4)?] -n Var(X) 
j=l 


i 
7 
5 
os 
l 
3 
] 


l 
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SES 
| 
e 
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~ 
3 
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Dividing both sides by n — 1 demonstrates that ô is an unbiased estimator of 
v. d 


The following example generalizing these results may appear somewhat 
artificial at this point, but is important in connection with the Biihlmann— 
Straub model of Section 16.4.5. 


Example 16.7 Suppose X1,...,Xn are independent with common mean p = 
E(X;) and variance Var(X;) = 8 + a/mj;,a,B > 0 and all mj > 1. Let 
m= Zj- m; and consider the three estimators 


and 


Ama 
_ i=l mj B+ 


a 
j=l m; +a 


Show that all three estimators are unbiased for u and then rank them in order 
by mean-squared error. Also obtain the expected value of a sum of squares 
that may be useful for estimating a and GB. 


First consider X. 


n n 
E(X) = m’ mjE(X;) =m Y mjn = n, 
j=1 j=1 


m? 5 m? Var(X;) 
1 


Var(X) 


n 


j 


j=l 


n 
-1 -2 2 
am * + Bm m3. 
j=1 
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The estimator ĝ is the one defined in Theorem 16.6 and has already been 
shown to be unbiased. We also have 


Var (j2;) 


n 


n? 5 Var(X;) 


j=l 
—2 “ a 
2 (65) 


n 
Bni +n 7a X m7*. 
= 


Il 


With regard to fis, 
E(fg) = ~~ Tj = a mj = pb 
j=1 mjpf +a j=1 mj PB +a 


‘wing = is a (+a) 
(Rats) 
Dil aot) 
aa) 


~1 


and 


n 
e 
DTT 


j=1 


We now consider the relative ranking of these variances (because all three 
estimators are unbiased, their mean-squared errors equal their variances, so 
it is sufficient to rank the variances). To show that it is not possible to order 
Var(ĝ,) and Var(X), examine their difference: 


n n 
Var(X) — Var(ji,) =a | m-n? Ymy’ +8 | m? Som} =n} 
j=1 j=l 
The coefficient of 6 must be nonnegative. To see this, note that 
2 
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(the left-hand side is like a sample second moment and the right-hand side is 
like the square of the sample mean) and then multiply both sides by nm7~?. 
To show that the coefficient of a must be nonpositive, note that 


(the harmonic mean is always less than or equal to the arithmetic mean) 
and then multiply both sides by n and then invert both sides. Therefore, by 
suitable choice of a and £, the difference in the variances can be made positive 
or negative. 

We can do more than just show that jf. has the smallest variance of the 
three. Consider an arbitrary estimator of the form ~ = er a;X;, where 
ee a; = 1 (needed to ensure that / is unbiased). All three estimators are 
of this type. Incorporating the constraint by using Lagrange multipliers, the 
smallest variance is found by minimizing 


Soa} Var(X;) +2 Soa; —1). 
j=1 j=l 


The derivative with regard to a; is 
2a; Var(X:i) +À, 


and setting it equal to zero gives a; = —)[2Var(X;)]~+. In other words, 
the weights should be proportional to the reciprocal of the variance. These 
are precisely the weights used in jij, and therefore it must have the smallest 


. variance of all linear estimators. 


With regard to a sum of squares, consider 
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ree — xX) 


j=l 


ll 


Y mj(Xj — w+ p— X)? 
j=l 


= Em -u +420 m (X; -uu - R) 
j=l 


j=l 


+5 mju- X)? 


j=1 
= Y ry(X;- u)? + 2u- X) Y mX; —n) 
j=1 j=l 
+m(u - X} 
= Jom- u? + Uu- X)m(X — u) + mlu- X)? 


= PO m(X; -u -m3 - p’. (16.11) 
j=1 


Taking expectations yields 


E » ™m3(X;j = wy Domb ly a. p)?] a mE|(X _ uy] 
= Som; Var(X;) — m Var(X) 


j=l 
nr a 2 n 
= Somi (+2) -B (» Sn) -Q 
j=1 j=1 
and thus 


E Ero -x7 =f [mmo +a(n-—1). (16.12) 
j=l j=l 


In addition to being of interest in its own right, (16.12) provides an unbiased 
estimator in situations more general than (16.9). The latter is recovered with 
the choice a = 0 and my; = 1 for j = 1,2,--- ,n, implying that m = n. Also, 
if 8 = 0, (16.12) allows us to derive an estimator of a when each X; is the 
average of m; independent observations each with mean p and variance a. In . 
any event, it is usually the case that the mjs (and hence m) are known. oO 
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16.2.4 Exercises 


16.1 Suppose X is binomially distributed with parameters nı and p, that is, 
_ fm T nı—r 
f(a) = (")p (1 —p) ’ x=0,1,2,...,m. 


Suppose also that Z is binomially distributed with parameters ng and p inde- 
pendently of X. Then Y = X + Z is binomially distributed with parameters 
nı +n and p. Find the conditional distribution of X given that Y = y. 


16.2 Let X and Y have joint probability distribution as follows: 


y 
T 0 1 2 
0 0.20 0 0.10 
1 0 0.15 0.25 
2 0.05 0.15 0.10 


(a) Compute the marginal distributions of X and Y. 


(b) Compute the conditional distribution of X given Y = y for y = 
0,1, 2. : 


(c) Compute E(X|y),E(X?|y), and Var(X|y) for y = 0, 1,2. 
(d) Compute E(X) and Var(X) using (16.5), (16.8), and (c). 


16.3 Suppose that X and Y are two random variables with bivariate normal 
joint density function 


I 


fxy(Z, y) 


Show that: 


(a) The conditional density function is 


1 1 
2g) = aeae -5 | r 
fx (zly) Bandia P) 3 o1,/1— p? 
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Hence, z 
E(X|¥ = y) = p + P= (y ~ m). 


(b) The marginal pdf is 


i= Feo E) 


(c) The variables X and Y are independent if and only if p = 0. 
16.4 Suppose that the random variables Yj,--- , Yn are independent with 
E(Y;)=y and Var(¥j) =a; +07/b;, 7 =1,2,-...n. 


is b; 
Define b = bı + bg +-+- + bn and Y = yaa RA Prove that 


E 2t; —Y)?| = (n -1)o? pa (e — 2) l 
j= = 


16.5 Suppose that given © = (Q1,02) the random variable X is normally 
distributed with mean ©, and variance Qo. 


(a) Show that E(X) = E(Q1) and Var(X) = E(Q2) + Var (©1). 
(b) If O1 and O% are independent, show that X has the same distribu- 


tion as ©,-+Y, where ©, and Y are independent and Y conditional 
on ©» is normally distributed with mean 0 and variance Oo. 


16.6 Suppose that © has pdf 7(@), 0 > 0, and ©, has pdf m1 (8) = 7 (0 — a), 0 
>a> 0. If, given O1, X is Poisson distributed with mean 0;, show that X 
has the same distribution as Y + Z, where Y and Z are independent, Y is 
Poisson distributed with mean a, and Z|O is Poisson distributed with mean 


©. 


16.3 LIMITED FLUCTUATION CREDIBILITY THEORY 


This branch of credibility theory represents the first attempt to quantify the 
credibility problem. This approach was suggested in the early part of the 
century in connection with workers compensation insurance. The original 
paper on the subject was by Mowbray in 1914 [96]. The problem may be 
formulated as follows. Suppose that a policyholder has experienced X; claims 
or losses? in past experience period j, where j € {1,2,3,... n}. Another 


3 “Claims” will refer to the number of claims and “losses” will refer to payment amounts. 
In many cases, such as in tbis introductory paragraph, the ideas apply equally whether we 
are counting claims or losses. 
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view is that X; is the experience from the jth policy in a group or from the 
jth member of a particular class in a rating scheme. Suppose that E(X;) = 
€, that is, the mean is stable over time or across the members of a group 
or class.* This quantity would be the premium to charge (net of expenses, 
profits, and a provision for adverse experience) if only we knew its value. 
Also suppose Var(X;) = 0”, again, the same for all j. The past experience 
may be summarized by the average X = n~1(X,+---+ Xn). We know that 
E(X) = £, and if the X; are independent, Var(X) = o?/n. The insurer’s 
goal is to decide on the value of €. One possibility is to ignore the past data 
(no credibility) and simply charge M a value obtained from experience on 
other similar but not identical policyholders. This quantity is often called the 
manual premium because it would come from a book (manual) of premiums. 
Another possibility is to ignore M and charge X (full credibility). A third 
possibility is to choose some combination of M and X (partial credibility). 

From the insurer’s standpoint, it seems sensible to “lean toward” the choice 
X if the experience is more “stable” (less variable, o? small). This implies 
that X is of more use as a predictor of next year’s results. Conversely, if the 
experience is more volatile (variable), then X is of less use as a predictor of 
next year’s results and the choice M makes more sense. 

Also, if we have an a priori reason to believe that the chances are great that 
this policyholder is unlike those who produced the manual premium M, then 
more weight should be given to X. This is because as an unbiased estimator 
X tells us something useful about € while M is likely to be of little value. 
On the other hand, if all of our other policyholders have similar values of € 
there is no point in relying on the (perhaps limited) experience of any one of 
them when M is likely to provide an excellent description of the propensity 
for claims or losses. l 

While reference is made to policyholders, the entity contributing to each 
X; could arise from a single policyholder, a class of policyholders possessing 
similar underwriting characteristics, or a group of insureds assembled for some 
other reason. For example, for a given year j, X; could be the number of 
claims filed in respect of a single automobile policy in one year, the average 
number of claims filed by all policyholders in a certain ratings class (e.g., 
single, male, under age 25, living in an urban area, driving over 7,500 miles 
per year), or the average amount of losses per vehicle for a fleet of delivery 
trucks owned by a food wholesaler. 

We first present one approach to decide whether to assign full credibility 
(charge X), and then we present an approach to assign partial credibility if it 
is felt that full credibility is inappropriate. 


4The customary symbol for the mean, y, is not used here because that symbol is used for a 
different but related mean in the next section. We have chosen this particular symbol (“Xi”) 
because it is the most difficult Greek letter to write and pronounce. It is an unwritten rule 
of textbook writing that it appear at least once. 
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16.3.1 Full credibility 


One method of quantifying the stability of X is to infer that X is stable if 
the difference between X and € is small relative to € with high probability. 
In statistical terms, this means that we should select two numbers r > 0 and 
0 < p< 1 (with r close to 0 and p close to 1, common choices being r = 0.05 
and p = 0.9) and assign full credibility if 


Pr(—ré < X — £ < rê) > p. (16.13) 


It is convenient to restate (16.13) as 


ee 


Now let yp be defined by 


seale aah ae.) 


That is, yp is the smallest value of y which satisfies the probability statement 
in braces in (16.14). If X has a continuous distribution, the “>” sign in 
- (16.14) may be replaced by an “=” sign and yp satisfies 


Pr ([ = 


E 
ee a = 
fa Up | =P. (16.15) 
Then the condition for full credibility is rén /a > Ups 


j <$) > 


o 


eee ey Saye (16.16) 


where ào = (yp/r). Condition (16.16) states that full credibility is assigned 


if the coefficient of variation o/€ is no larger than ,/n/Xo, an intuitively 
reasonable result. 


Also of interest is that (16.16) can be rewritten to show that full credibility 
occurs when 


o? € 
Var(X) = — < >, : 
ar(X) = PST (16.17) 
Alternatively, solving (16.16) for n gives the number of exposure units 
required for full credibility, namely 


n> do Bj . (16.18) 


In many situations it is reasonable to approximate the distribution of X by 
a normal distribution with mean € and variance o?/n. For example, central 
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limit theorem arguments may be applicable if n is large. In that case (X — 
€)/(¢//n) has a standard normal distribution. Then (16.15) becomes (where 
Z has a standard normal distribution and @(y) is its cdf) 


p = Pr(|Z| < yp) 
= Pr(i-y<Z< Yp) 
= (yp) — &(—-yp) 
®(yp) — 1+ B(yp) 
= 208(yp) —1. 


Therefore ®(yp) = (1+ p)/2 and therefore yp is the (1+ p)/2 percentile of the 
standard normal distribution. 

For example, if p = 0.9, then standard normal tables give yo.g = 1.645. 
If, in addition, r = 0.05, then Ay = (32.9)? = 1,082.41 and (16.18) yields 
n > 1,082.410? J€. Note that this answer assumes we know the coefficient 
of variation of X;. It is possible we have some idea of its value, even though 
we do not know the value of € (remember, that is the quantity we want to 
estimate). 

The important thing to note when using (16.18) is that the coefficient of 
variation is for the estimator of the quantity to be estimated. The right- 
hand side gives the standard for full credibility when measuring it in terms of 
exposure units. If some other unit is desired, it is usually sufficient to multiply 
both sides by an appropriate quantity. Finally, any unknown quantities will 
have be to estimated from the data. This implies that the credibility question 
can be posed in a variety of ways. The following SpE cover the most 
common cases. 


Example 16.8 Suppose past losses X1, ..., Xn are available for a particular 
policyholder. The sample mean is to be used to estimate € = E(X;). Deter- 
mine the standard for full credibility. Then suppose there were 10 observations 
with 6 being zero and the others being 253, 398, 439, and 756. Determine the 
full-credibility standard for this case with r = 0.05 and p = 0.9. 


The solution is available directly from (16.18) as 


o 2 
nza (g): 


For this specific case, the mean and standard deviation can be estimated from 
the data as 184.6 and 267.89 (where the variance estimate is the unbiased 
version using n — 1). With Ap = 1082.41, the standard is 


267.89 \* 
> . ——— | = 2279.51 
n> 108241 (=) 79.5 
and the 10 observations do not deserve full credibility. Oo 
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In the next example it is further assumed that the observations are from a 
particular type of distribution. 


Example 16.9 Suppose that past losses X1,..., Xn are available for a partic- 
ular policyholder and it is reasonable to assume that the X js are independent 
and compound Poisson distributed, that is, Xj =Y + +Y;n;, where each 
N; is Poisson with parameter À and the claim size distribution Y has mean 
Oy and variance o%-. Determine the standard for full credibility when esti- 
mating the expected number of claims per policy and then when estimating the 
expected dollars of claims per policy. Then determine if these standards are 
-met for the data in Example 16.8, where it is now known that the first three 
nonzero payments came from a single claim but the final one was from two 
claims, one for 129 and the other for 627. 


Case 1: Accuracy is to be measured with regard to the average number of 
claims. Then, using the Njs rather than the X78, we have € = E(.N;) = à and 
o? = Var(N;) = À, implying from (16.18) that 


- Thus, if the standard is in terms of the number of policies, it will have to 
exceed Ao/A for full credibility and À will have to be estimated from the data. 
If the standard is in terms of the number of expected claims, that is, nA, we 
must multiply both sides by À. This sets the standard as 


While it appears that no estimation is needed for this standard, it is in terms 
of the expected number of claims needed. In practice, the standard is set in 
terms of the actual number of claims experienced, effectively replacing nA on 
the left by its estimate Ny +---+ Nn. 


For the given data, there were 5 claims, for an estimate of of 0.5 per 
policy. The standard is then 


n> = = 2,164.82 


~ 0. 
and the 10 policies are far short of this standard. Or the 5 actual claims could 
be compared to Ap = 1,082.41, which leads to the same result. 

Case 2: When accuracy is with regard to the average total payment, we have 
€ = E(X;) = Ady and Var(¥;) = A(6}- + 02), formulas developed in Chapter 
6. In terms of the sample size, the standard is 


Gye +O%) à A 
na wee) e faa (22) 
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If the standard is in terms of the expected number of claims, multiply both 


sides by A to obtain : 
nÀ > ào i+ (=) | 
by 


Finally, if the standard is in terms of the expected total dollars of claims, 
multiply both sides by ĝy to obtain 


2 
ndOy > Ao (ox + $) . 
Y 
For the given data, the five claims have mean 369.2 and standard deviation 
189.315 and thus 


2 2 
Xo oy 1,082.41 (sae) _ 
a0. ete = 2 | 1 — = 2,734.02 
ne h+ (=) | o5 | + \ 3692 

and again the 10 observations are far short of what is needed. If the standard 
is to be set in terms of claims (of which there are 5), multiply both sides by 
0.5 to obtain a standard of 1,367.01. Finally, the standard could be set in 
terms of total dollars of claims. To do so, multiply both sides by 369.2 to 


obtain 504,701. Note that in all three cases the ratio of the observed quantity 
to the corresponding standard is unchanged: 
10 5 1,846 


= = = 0.003658. 
2,734.02 1,367.01 504,701 


E 


In these examples, the standard for full credibility is not met and so the 
sample means are not sufficiently accurate to be used as estimates of the 
expected value. We need a method for dealing with this situation. 


16.3.2 Partial credibility 


If it is decided that full credibility is inappropriate, then for competitive rea- 
sons (or otherwise) it may be desirable to reflect the past experience x in the 
net premium as well as the externally obtained mean, M. An intuitively ap- 
pealing method for doing this is through a weighted average, that is, through 
the credibility premium 


P,=ZX+(1-Z)M, (16.19) 


where the credibility factor Z € [0,1] needs to be chosen. There are many 
formulas for Z which have been suggested in the actuarial literature, usually 
justified on intuitive rather than theoretical grounds. (We remark that Mow- 
bray [96] considered full, but not partial credibility.) One important choice 


1S n 


et 16.20 
n+k’ ( ) 
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where k needs to be determined. This particular choice will be shown to be 
theoretically justified on the basis of a statistical model to be presented in the 
next section. Another choice, based on the same idea as full credibility (and 
including the full-credibility case Z = 1), will now be discussed. 

A variety of arguments have been used for developing the value of Z , Many 
of which lead to the same answer. All of them are flawed in one way or 
another. The development we have chosen to present is also flawed but is 
at least simple. Recall that the goal of the full-credibility standard was to 
ensure that the difference between the net premium we are considering (X) 
and what we should be using (€) is small with high probability. Because X 
is unbiased, this is essentially (and exactly if X has the normal distribution) 
equivalent to controlling the variance of the proposed net premium, X, in this 
case. We see from (16.17) that there is no assurance that the variance of X 
will be small enough. However, it is possible to control the variance of the 
credibility premium, P,, as follows: 


g 
= Var[ZX +(1— Z)M] 
= Z’ Var(X) 
= zË 
z 


Thus Z = (€/0)\/n/Ao, provided it is less than 1. This can be written 
using the single formula 


. JSE fn 

Z= >, / => 
min { SVI 1}. (16.21) 
One interpretation of (16.21) is that the credibility factor Z is the ratio of 
the coefficient of variation required for full credibility (,/n/Xo) to the actual 


coefficient of variation. For obvious reasons this is often called the square root 
rule for partial credibility. 


While we could do the algebra with regard to (16.21), it is sufficient to note 


that it always turns out that Z is the square root of the ratio of the actual 
count to the count required for full credibility. 


Example 16.10 Suppose in Example 16.8 that the manual premium M is 
225. Determine the credibility estimate. 


The average of the payments is 184.6. With the square root rule the cred- 


ibility factor is 
/ 10 
Fy | 
3,979.51 0.06623. 


Then the credibility premium is 
P, = 0.06623(184.6) + 0.93377(225) = 222.32. u 
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Example 16.11 Suppose in Example 16.9 that the manual premium M is 
225. Determine the credibility estimate using both cases. 


For the first case, the credibility factor is 
Z= Se 0.06797 
~ V 1,082.41 


P, = 0.06797(184.6) + 0.93203(225) = 222.25. 


and applying it yields 


At first glance this may. appear inappropriate. The standard was set in terms 
of estimating the frequency but was applied to the aggregate claims. Often, 
individuals are distinguished more by differences in the frequency with which 
they have claims rather than by differences in the cost per claim. So this 
factor captures the most important feature. 

For the second case, we can use any of the three calculations: 


/ 10 | 5 |} 1,846 


Then, q 
P, = 0.06048(184.6) + 0.93952(225) = 222.56. 


Earlier we mentioned a flaw in the approach. Other than assuming that the 
variance captures the variability of X in the right way, all of the mathematics 
is correct. The flaw is in the goal. Unlike X, P, is not an unbiased estimator 
of €. In fact, one of the qualities that allows credibility to work is its use 
of biased estimators. But that means that the appropriate measure of the 
quality of P, is not its variance, but its mean-squared error. However, the 
mean-squared error requires knowledge of the bias, and, in turn, that requires 
knowledge of the relationship of € and M. However, we know nothing about 
that relationship, and the data we have collected are of little help. As noted 
in the next subsection, this is not only a problem with our determination of 
Z, it is a problem that is characteristic of the limited fluctuation approach. 
A model for this relationship is introduced in the next section. 

This section closes with a few additional examples. In each of the first two 
examples ào = 1,082.41 is used. 


Example 16.12 For group dental insurance, historical experience on many 
groups has revealed that annual losses per life insured have a mean of 175 
and a standard deviation of 140. A particular group has been covered for two 
years with 100 lives insured in year 1 and 110 in year 2 and has experienced 
average claims of 150 over that period. Determine if full or partial credibility 
is appropriate, and determine the credibility premium for next year’s losses if 
there will be 125 lives insured. 
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We will apply the credibility on a per-life-insured basis. We have observed 
100+110 = 210 exposure units (assume experience is independent for different 
lives and years), and X = 150. Now M = 175 and we assume that o will be 
140 for this group. Because we are trying to estimate the average cost per 
person, the calculations done in Example 16.11 for Case 2 apply. Thus, with 
n = 210 and ào = 1,082.41 we estimate Oy with the sample mean of 150 to 
obtain the standard for full credibility as 


140\? 
>1 i — = : 
n > 1,082.41 (=) 942.90 
~ and then calculate 
210 
a> Ngaio 


(note that X is the average of 210 claims, so approximate normality is assumed 
by the central limit theorem). Thus, the net premium per life insured is 


P, = 0.472(150) + 0.528(175) = 163.2. 
The net premium for the whole group is 125(163.2) = 20,400. O 


Example 16.13 An insurance coverage involves credibility based on number 
of claims only. For a particular group, 715 claims have been observed. Deter- 
mine an appropriate credibility factor, assuming that the number of claims is 
Poisson distributed. 


This is Case 1 from Example 16.11 and the standard for full credibility 
with regard to the number of claims is nà > Ap = 1082.41. Then 


715 


ies 1,082.41 


= 0.813. 
O 


Example 16.14 Past data on a particular group are X = (X1, Xa,..., Nel 
where the X; are independent and identically distributed compound Poisson 
random variables with exponentially distributed claim sizes. If the credibility 
factor based on claim numbers is 0.8, determine the appropriate credibility 
factor based on total claims. 


When based on Poisson claim numbers, from Example 16.9, Z = 0.8 implies 
that An/Ao = (0.8)? = 0.64, where An is the observed number of claims. For 
exponentially distributed claim sizes o2 = 6]-. From Case 2 of Example 16.9, 
the standard for full credibility in terms of the number of claims is 


2 
1+ (5) | = 2Ap. 
by 


Z =,/ 2 = O39 = 0.566. 
No 


nA > ào 


Then 
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16.3.3 Problems with the approach 


While the limited fluctuation approach yields simple solutions to the problem, 
there are theoretical difficulties. First, there is no underlying theoretical model 
for the distribution of the X js and thus no reason why a premium of the form 
(16.19) is appropriate and preferable to M. Why not just estimate € from a 
collection of homogeneous policyholders and charge all policyholders the same 
rate? While there is a practical reason for using (16.19), no model has been 
presented to suggest that this may be appropriate. Consequently, the choice 
of Z (and hence P,) is completely arbitrary. 

Second, even if (16.19) were appropriate for a particular model, there is no 
guidance for the selection of r and p. 

Finally, the limited fluctuation approach does not examine the difference 
between € and M. When (16.19) is employed, we are essentially stating that 
the value of M is accurate as a representation of the expected value given 
no information about this particular policyholder. However, it is usually the 
case that M is also an estimate and therefore unreliable in itself. The correct 
credibility question should be “how much more reliable is X compared to M?” 
and not “how reliable is X?” 

In the remainder of this chapter, a systematic modeling approach is pre- 
sented for the claims experience of a particular policyholder which suggests 
that the past experience of the policyholder is relevant for prospective rate 
making. Furthermore, the intuitively appealing formula (16.19) is a conse- 
quence of this approach, and Z is often obtained from relations of the form 
(16.20). 


16.3.4 Notes and References 


‘The limited fluctuation approach is discussed by Herzog [52] and Longley- 
Cook [87]. See also Norberg [100]. 


16.3.5 Exercises 


16.7 An insurance company has decided to establish its full-credibility re- 
quirements for an individual state rate filing. The full-credibility standard 
is to be set so that the observed total amount of claims underlying the rate 
filing would be within 5% of the true value with probability 0.95. The claim 
frequency follows a Poisson distribution and the severity distribution has pdf 


100 —z 
f(z) oe 5,000 , 


Determine the expected number of claims necessary to obtain full credibility 
using the normal approximation. 


0< z< 100. 


16.8 For a particular policyholder, the past total claims experience is given 
by Xı,..., Xn, where the Xjs are independent and identically distributed 
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Table 16.1 Data for Exercise 16.9 


3 
Year 1 2 
Claims 475 550 400 


compound random variables with Poisson parameter \ and gamma claim size 
distribution with pdf 


a—1_—y/B 
fry) = aC cae y>0. 


You also know the following: 
1. The credibility factor based on numbers of claims is 0.9. 
2. The expected claim size a8 = 100. 


3. The credibility factor based on total claims is 0.8. 
Determine a and £. 


16.9 For a particular policyholder, the manual premium is 600 per year. The 
past claims experience is given in Table 16.1. Assess whether full or partial 
credibility is appropriate and determine the net premium for next year’s claims 
assuming the normal approximation. Use r = 0.05 and p = 0.9. 


16.10 Redo Example 16.10 assuming that X; is a compound negative bino- 
mial distribution rather than compound Poisson. 


16.11 (*) The total number of claims for a group of insureds is Poisson with 
mean A. Determine the value of À such that the observed number of claims will 
be within 3% of A with a probability of 0.975 using the normal approximation. 


16.12 (*) An insurance company is revising rates based on old data. The 
expected number of claims for full credibility is selected so that observe 
total claims will be within 5% of the true value 90% of the time. Individual 
claim amounts have pdf f(z) = 1/200,000, 0 < z < 200,000, and the number 
of claims has the Poisson distribution. The recent experience consists of 1,082 
claims. Determine the credibility, Z, to be assigned to the recent experience. 
Use the normal approximation. 


16.13 (*) The average claim size for a group of insureds is 1,500 with a 
standard deviation of 7,500. Assume that claim counts have the Poisson 
distribution. Determine the expected number of claims so that the total loss 
will be within 6% of the expected total loss with probability 0.90. 
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16.14 (*) A group of insureds had 6,000 claims and a total loss of 15,600,000. 
The prior estimate of the total loss was 16,500,000. Determine the limited fluc- 
tuation credibility estimate of the total loss for the group. Use the standard 
for full credibility determined in Exercise 16.13. 


16.15 (*) The full-credibility standard is set so that the total number of 
‘claims is within 5% of the true value with probability p. This standard is 
800 claims. The standard is then altered so that the total cost of claims is 
to be within 10% of the true value with probability p. The claim frequency 
has a Poisson distribution and the claim severity distribution has pdf f(x) = 
0.0002(100 — x), 0 < £ < 100. Determine the expected number of claims 
necessary to obtain full credibility under the new standard. 


16.16 (*) A standard for full credibility of 1,000 claims has been selected 
so that the actual pure premium will be within 10% of the expected pure 
premium 95% of the time. The number of claims has the Poisson distribution. 
Determine the coefficient of variation of the severity distribution. 


16.17 (*) For a group of insureds you are given the following information: 
1. The prior estimate of expected total losses is 20,000,000. 
2. The observed total losses are 25,000,000. 
3. The observed number of claims is 10,000. 
4. The number of claims required for full credibility is 17,500. 


Determine the credibility estimate of the group’s expected total losses based 
upon all the above information. Use the credibility factor that is appropriate 
‘if the goal is to estimate the expected number of losses. 


16.18 (*) A full-credibility standard is determined so that the total number of 
claims is within 5% of the expected number with probability 98%. If the same 
expected number of claims for full credibility is applied to the total cost of 
claims, the actual total cost would be within 100K % of the expected cost with 
95% probability. Individual claims have severity pdf f(r) = 2.52735, z > 1 
and the number of claims has the Poisson distribution. Determine K. 


16.19 (*) The number of claims has the Poisson distribution. The number of 
claims and the claim severity are independent. Individual claim amounts can 
be for 1, 2, or 10 with probabilities 0.5, 0.3, and 0.2, respectively. Determine 
the expected number of claims needed so that the total cost of claims is within 
10% of the expected cost with 90% probability. 


16.20 (*) The number of claims has the Poisson distribution. The coefficient 
of variation of the severity distribution is 2. The standard for full credibility 
in estimating total claims is 3,415. With this standard the observed pure 
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premium will be within k% of the expected pure premium 95% of the time. 
Determine k. 


16.21 (*) You are given the following: 
1. P = Prior estimate of pure premium for a particular class of business. 


2. O = Observed pure premium during the latest experience period for the 
same class of business. 


3. R = Revised estimate of pure premium for the same class following the 
observations. 


4. F = Number of claims required for full credibility of the pure premium. 


Express the observed number of claims as a function of these four items. 


16.4 GREATEST ACCURACY CREDIBILITY THEORY 


16.4.1 Introduction 


In this and the following section, we consider a model-based approach to 
the solution of the credibility problem. This approach, referred to as great- 
est accuracy credibility theory, is the outgrowth of a classic 1967 paper by 
Bühlmanı [18]. Many of the ideas are also found in Whitney [136] and Bailey 


z We return to the basic problem. For a particular policyholder, we have 
observed n exposure units of past claims X = (Xj,... Xa)” We have a 
manual rate u (we no longer use M for the manual rate) which is applicable 
to this policyholder, but the past experience indicates that this may not be 
appropriate [X = n“! (X1 +--+ Xn), as well as E(X), could be quite dif- 
ferent from p]. This raises the question of whether next year s net premium 
(per exposure unit) should be based on p, on X, or on a combination of the 
O. 
mie insurer needs to consider the following question: Is the policyholder 
really different from what has been assumed in the calculation of HOr has 
it just been random chance which has been responsible for the differences 
between u and X? P 
While it is difficult to definitively answer the above question, it is clear 
that no underwriting system is perfect. The manual rate p has presumably 
been obtained by (a) evaluation of the underwriting characteristics of the 
policyholder and (b) assignment of the rate on the basis of inclusion of the 
policyholder in a rating class. Such a class should include risks with similar 
underwriting characteristics. In other words, the rating class is viewed as 


homogeneous with respect to the underwriting characteristics used. Surely, — 


not all risks in the class are truly homogeneous, however. No matter how 
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detailed the underwriting procedure, there still remains some heterogeneity 
with respect to risk characteristics within the rating class (good and bad risks, 
relatively speaking). 

Thus, it is possible that the given policyholder may be different from what 
has been assumed. If this is the case, how should one choose an appropriate 
rate for the policyholder? 
` To proceed, let us assume that the risk level of each policyholder in the 
rating class may be characterized by a risk parameter 0 (possibly vector val- 
ued), but the value of @ varies by policyholder. This allows us to quantify 
the differences between policyholders with respect to the risk characteristics. 
Because all observable underwriting characteristics have already been used, 
0 may be viewed as representative of the residual, unobserved factors which 
affect the risk level. Consequently, we shall assume the existence of 0, but we 
shall further assume that it is not observable and that we can never know its 
true value. 

Because @ varies by policyholder, there is a probability distribution with pf 
7(@) of these values across the rating class. Thus, if 0 is a scalar parameter, the 
cumulative distribution function II(@) may be interpreted as the proportion 
of policyholders in the rating class with risk parameter © less than or equal 
to 0. [In statistical terms, © is a random variable with distribution function 
TI(@) = Pr(© < @).] Stated another way, II(@) represents the probability that 
a policyholder picked at random from the rating class has a risk parameter 
less than or equal to 6 (to accommodate the possibility of new insureds, we 
slightly generalize the “rating class” interpretation to include the population 
of all potential risks, whether insured or not). 

While the @ value associated with an individual policyholder is not (and 
cannot be) known, we assume (for this section) that (8) is known. That is, 
the structure of the risk characteristics within the population is known. This 
assumption can be relaxed, and we shall decide later how to estimate the 
relevant characteristics of 7(@) because this is needed in order to implement 
the theory. 

Because risk levels vary within the population, it is clear that the experi- 
ence of the policyholder varies in a systematic way with 0. Imagine that the 
experience of a policyholder picked (at random) from the population arises 
from a two-stage process. First, the risk parameter 0 is selected from the 
distribution 7(9). Then the claims or losses X arise from the conditional 
distribution of X given 6, fx\e(z|@). Thus the experience varies with 8 via 
the distribution given the risk parameter 0. The distribution of claims thus 
differs from policyholder to policyholder to reflect the differences in the risk 
parameters. 


Example 16.15 Consider a rating class for automobile insurance, where @ 
represents the expected number of claims for a policyholder with risk parame- 
ter 0. To accommodate the variability in claims incidence, we assume that 
the values of 0 vary across the rating class. Relatively speaking, the good 
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Table 16.2 Probabilities for Example 16.16 


x Pr(X = 2|0 = G) Pr(X = 2|O = B) 6 Pr(Q = #) 
0 0.7 0.5 G 0.75 

1 0.2 0.3 B 0.25 

2 0.1 0.2 


drivers are those with small values of 6, whereas the poor drivers are those 

. with larger values of 0. It is convenient mathematically in this case to assume 
that the number of claims for a policyholder with risk parameter 0 is Poisson 
distributed with mean 0. The random variable © may also be assumed to be 
gamma, distributed with parameters a and B. Suppose it is known that the 
average number of expected claims for this rating class is 0.15 [E(O) = 0.15], 
and 95% of the policyholders have expected claims between 0.10 and 0.20. 
Determine a and £. 


Assuming the normal approximation to the gamma, where it is known that 
95% of the probability lies within about two standard deviations of the mean, 
it follows that © has standard deviation 0.025. Thus E(Q) = of = 0.15 and 
Var(©) = a? = (0.025). Solving for a and £ yields @ = 1/240 and a = 36.0 


Example 16.16 There are two types of driver. Good drivers make up 75% 
of the population and in one year have zero claims with probability 0.7, one 
claim with probability 0.2, and two claims with probability 0.1. Bad drivers 
make up the other 25% of the population and have zero, one, or two claims 
with probabilities 0.5, 0.3, and 0.2, respectively. Describe this process and how 
it relates to an unknown risk parameter. 


When a driver buys our insurance, we do not know if the individual is a 
good or bad driver. So the risk parameter © can be one of two values. We 
can set © = G for good drivers and © = B for bad drivers. The probability 
model for the number of claims, X, and risk parameter © is given in Table 
16.2. 0 


Example 16.17 The amount of a claim has the exponential distribution with 
mean 1/0. Among the class of insureds and potential insureds, the parameter 
© varies according to the gamma distribution with a = 4 and scale parameter 
B = 0.001. Provide a mathematical description of this model. 


For claims, 
Fxjo(zl?) = Gers, z, > 0, 


and for the risk parameter, 


= ge- 1,0000 1,0004 


Tol) = —~——, 8 > 0. a 
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16.4.2 The Bayesian methodology 


Continue to assume that the distribution of the risk characteristics in the 
population may be represented by m(@), and the experience of a particular 
policyholder with risk parameter 0 arises from the conditional distribution 
fxjo(z|@) of claims or losses given 6. 

We now return to the problem introduced in Section 16.3. That is, for a 
particular policyholder, we have observed X = x, where X = (Xj,..., Xn)" 
and x = (21,...,an)", and are interested in setting a rate to cover X41. We 
assume that the risk parameter associated with the policyholder is 0 (which 
is unknown). Furthermore, the experience of the policyholder corresponding 
to different exposure periods is assumed to be independent. In statistical 
terms, conditional on @, the claims or losses Xis- -3 Anyang are independent 
(although not necessarily identically distributed) 

Let X; have conditional pf 


fx;0(z;18), j=Hl,...,n,n+1. 


Note that, if the X; are identically distributed (conditional on © = @), then 
Fx;)0 (2319) does not depend on j. Ideally, we are interested in the conditional 
distribution of X,41 given © = @ in order to predict the claims experience 
Xn+1 of the same policyholder (whose value of @ has been assumed not to 
have changed). If we knew 6, we could use Fxn41|0(@n41|0). Unfortunately. 
we do not know @, but we do know x for the same policyholder. The obvious 
next step is to condition on x rather than 8. Consequently, we will calculate 
the conditional distribution of Xn+1 given X = x, termed the predictive 
distribution as defined in Section 12.4. 

‘The predictive distribution of Xn4i given X = x is the relevant distrib- 
ution for risk analysis, management, and decision making. It combines the 
uncertainty about the claims losses with that of the parameters associated 
with the risk process. 

Here we repeat the development in Section 12.4, noting that if © has a 
discrete distribution the integrals are replaced by sums. Because the X;s are 
independent conditional on © = 0, we have i 


fxe (x, 6) = f (21, ...3 In|0)7(6) = I Fx;\0 (x38) m(8). 
j=l 


The joint distribution of X is thus the marginal distribution obtained by 
integrating @ out, that is, 


es | TI Fxs10(es10) |700) ao. (16.22) 


Similarly, the joint distribution of X1,--.,Xn41 is the right-hand side of 
(16.22) with n replaced by n +1 in the product. Finally, the conditional 


| 
| 
! 
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density of X,41 given X = x is the joint density of (%,... ,Xn+1) divided 
by that of X, namely, 


n+l 


fxapjx(Tnlx) = E / i ro) 7(0) dé. (16.23) 


There is a hidden mathematical structure underlying (16.23) which may 
often be exploited. The posterior density of © given X is 


_ fx,o(x, 8) ek 1 
Tex (O|x) = fx(x) am Fx(x) 


j 


io n(0). (16.24) 


In other words, [Ts fx;io(z;18)| n(0) = Tox (f|x)fx(x), and substitution 
in the numerator of (16.23) yields 


Pxnai(x(2n4iX) = J fxayl10(£n+1|8) rox (8x) dé. (16.25) 


Equation (16.25) provides the additional insight that the conditional distri- 
_ bution of Xn+1 given X may be viewed as a mixture distribution, with the 
mixing distribution the posterior distribution 79)x(@|x). 

The posterior distribution combines and summarizes the information about 
@ contained in the prior distribution and the likelihood and consequently 
(16.25) reflects this information. As noted in Theorem 12.49, the posterior 
distribution admits a convenient form when the likelihood is derived from the 
linear exponential family and 7(@) is the natural conjugate prior. This pro- 
vides an easy method to evaluate the conditional distribution of Xn4+1 given 
X in these cases. 


Example 16.18 (Example 16.16 continued) For a particular policyholder 
suppose we have observed zı = 0 and z2 = 1. Determine the predictive 
distribution of X3|X, = 0, X2 = 1 and the posterior distribution of O|X1 = 
0, X2 = 1. 


From (16.22), the marginal probability is 
fx(0,1) Yo fx:10 01) fx.10 (110) 7 (@) 
8 


0.7(0.2)(0.75) + 0.5(0.3)(0.25) 
0.1425. 


ll 


Il 


Similarly, the joint probability of all three variables is 


fx,xa (0,1,23) = >> fx o (010) fx0 (18) fxa10(2318)1(8). 
8 
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Thus, 
fx.xa (0,1,0) = 0.7(0.2)(0.7)(0.75) + 0.5(0.3)(0.5)(0.25) = 0.09225, 
fx,x, (0,1,1) = 0.7(0.2)(0.2)(0.75) + 0.5(0.3)(0.3)(0.25) = 0.03225, 


fx,x; (0,1,2) = 0.7(0.2)(0.1)(0.75) + 0.5(0.3)(0.2)(0.25) = 0.01800. 
The predictive distribution is then 


0.09225 


0 1 = —_ 
fx5 x (010, 1) 0.1495 0.647368, 
0.03225 
1 1 = c 
fxax (10, 1) 145 7 0220316, 
0.01800 
fxsx(2l0, 1) Daas ~ 0126316. 


The posterior probabilities are, from (16.24), 


ani Ss, IIT kG). OO 


WO = Tp = 0.736842, 
n(Blo,1) = ODE pane = a a = 0.263158. 


From this point forward the subscripts on f and a will be dropped unless 
needed for clarity. The predictive probabilities could also have been obtained 
using (16.25). This method is often simpler from a computation viewpoint. 


F(0|0,1) = $ F(0[6)x(6I0, 1) 
6 
= 0.7(0.736842) + 0.5(0.263158) = 0.647368, 
f(1)0,1) = 0.2(0.736842) + 0.3(0.263158) = 0.226316, 


f(2|0,1) = 0.1(0.736842) + 0.2(0.263158) = 0.126316, 


which matches the previous calculations. o 


Example 16.19 (Example 16.17 continued) Suppose a person had claims of 
100, 950, and 450. Determine the predictive distribution of the fourth claim 
and the posterior distribution of ©. 


The marginal density at the observed values is 


f(100, 950, 450) 


co 
[ Ge 1008 9.9508 9, — 4500 1000" 03 e— 14,0008 q9 
0 


_ 1,0004 a g6 e—2,5000 49 — 10004 _720_ 
6 J 6 2,5007 
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Similarly, 


(100, 950, 450, £4) 


a Qe 1008 Qe—2500 9.4508 9—Or4 o. 93 e7 1:0008 q9 
0 


E 1000" F 07 e~ (2,500+24)8 79 
0 


1,000 5,040 
6 (2,500 + 24)8" 


Then the predictive density is 


1,000* 5,040 
6 (25004248 _  7(2,500)7 
f (241100, 950, 450) = 1,0004 720 ~~ (2,500 + z4)8 
6 2,5007 


which is a Pareto density with parameters 7 and 2,500. 

For the posterior distribution we take a shortcut. The denominator is an 
integral that produces a number and can be ignored for now. The numerator 
can be written 


4 
z (0|100, 950, 450) œ Ge 1008969600 ge —4500 AT ens, 


which was the term to be integrated in the calculation of the marginal density. 
Because there are constants in the denominator that have been ignored, we 
might as well ignore constants in the numerator. Only multiplicative terms 
involving the variable (0 in this case) need to be retained. Then 


(0|100, 950, 450) œ @%e~7:5008 | 


We could integrate this expression in order to determine the constant needed 
to make this a density function (that is, make the integral equal 1). But we 
recognize this function as that of a gamma distribution with parameters 7 and 
1/2,500. Therefore, 


9% e—2:50089 5007 
(7) 


Then the predictive density can be alternatively calculated from 


(6|100, 950, 450) = 


98_e-2,50099 5007 


= rs —Oxr4 
f(xa|100, 950, 450) f be T 


_ 2500" ca g7 e—(2:500+24)0 Jo 
k 0 
2,5007 7! 


6! (2,500 F r4)?’ 
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matching the answer previously obtained. (m 


Note that the posterior distribution is of the same type (gamma) as the 
prior distribution. The concept of a conjugate prior distribution was intro- 
duced in Section 12.4.3. This also implies that Xn41|x is a mixture distrib- 
ution with a simple mixing distribution, facilitating evaluation of the density 
of Xn+ı|x. Further examples of this idea are found in the exercises. 

To return to the original problem, we have observed X = x for a particular 
policyholder and we wish to predict X,41 (or its mean). An obvious choice 
would be the hypothetical mean (or individual premium) 


Msi (0) = E(Xn41/0 = 0) = J Inti fX_41\0(En41/0) dtng1 (16.26) 


if we knew 0. Note that replacement of 0 by © in (16.26) yields, upon taking 
the expectation, 


Maga = E(Xn+1) = E[E(Xn41/9)] = E[n (©)] 


so that the pure, or collective, premium is the mean of the hypothetical means. 
This is the premium we would use if we knew nothing about the individual. 
It does not depend on the individual’s risk parameter, 8, nor does it use x, 
the data collected from the individual. Because @ is unknown, the best we 
can do is try to use the data. This suggests the use of the Bayesian premium 
(the mean of the predictive distribution) 


E(Xn4i[X = x) = eet ficcsspe(Onsals) din41- (16.27) 

A computationally more convenient form is 
E(Xp4i|[X = x) = fmt (9) 7e)x (8|x) dé. (16.28) 
In other words, the Bayesian premium is the expected value of the hypothetical 
means, with expectation taken over the posterior distribution Tex (4|x). We 


remind the reader that the integrals are replaced by sums in the discrete case. 
To prove (16.28), we see from (16.25) that 


E(Xn|X =x) = TEOR din+1 


[enn if Frngsf0 (nts 10) opla dEn 


J | J Sati fryss10(Cnin10) eens Tex (4|x) dð 


J tara rep (Olx) a 


Il 
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Example 16.20 (Example 16.18 continued) Determine the Bayesian pre- 
mium using both (16.27) and (16.28). 


The (unobservable) hypothetical means are 


u3(G) 
u3(B) 


If, as in Example 16.18, we have observed xı = 0 and z2 = 1, we have the 
Bayesian premiums obtained directly from (16.27): 


(0)(0.7) + 1(0.2) + 2(0.1) = 0.4, 
(0)(0.5) + 1(0.3) + 2(0.2) = 0.7. 


I 


E(X3|0, 1) = 0(0.647368) + 1(0.226316) + 2(0.126316) = 0.478948. 
The (unconditional) pure premium is 


us = E (X3) = X pa (0)z (0) = (0.4) (0.75) + (0.7)(0.25) = 0.475. 
8 


To verify (16.28) with zı = 0 and z2 = 1, we have the posterior distribution 
z(0|0, 1) from Example 16.18. 
Thus, (16.28). yields 


E(X3|0, 1) = 0.4(0.736842) + 0.7(0.263158) = 0.478947 


with the difference due to rounding. In general, the latter approach utilizing 
(16.28) is simpler than the direct approach using the conditional distribution 
of Xn4i|X = x. o 


As expected, the revised value based on two observations is between the 
prior value (0.475) based on no data and the value based only on the data 
(0.5). 


Example 16.21 (Example 16.19 continued) Determine the Bayesian pre- 
mium. 


From Example 16.19, we have j4(9) = @~*. Then, (16.28) yields 


co gê —2,50009 5007 
E(X4|100, 950,450) = J cea le, 
5 720 
2,5007 120 
= “no maor = 11687. 


This could also have been obtained from the formula for the moments of the 
gamma distribution in Appendix A. From the prior distribution, 


p=E(O™) = 


ae = 333.33 
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and once again the Bayesian estimate is between the prior estimate and one 
based solely on the data (the sample mean of 500). 
From (16.27), 


2 
E(X4|100, 950, 450) = = 


= 416.67, 
the mean of the predictive Pareto distribution. Oo 


Example 16.22 Generalize the result of Example 16.21 for an arbitrary sam- 
ple size of n and an arbitrary prior gamma distribution with parameters a and 
B, where B is the reciprocal of the usual scale parameter. 


The posterior distribution can be determined from 


r(x) x A e785 Of ee 
(6) (ir a 


j=l 
x grte-le” (2z;+8)0. 


The second line follows because the posterior density is a function of @ and 
thus all multiplicative terms not involving 0 may be dropped. Rather than 
perform the integral to determine the constant, we recognize that the posterior 
distribution is gamma with first parameter n + a and scale parameter (Daj + 
B)-1. The Bayes estimate of X,+41 is the expected value of ©~! using the 
posterior distribution. It is 

Er;+ _ n a-1 B 


wees weet nea lee 


Note that the estimate is a weighted average of the observed values and the 
unconditional mean. This formula is of the credibility weighted type (16.19). 0O 


Here is an example where the random variables do not have identical dis- 
tributions. 


Example 16.23 Suppose that the number of claims N; in year j for a group 
policyholder with (unknown) risk parameter 0 and m; individuals in the group 
is Poisson distributed with mean m,0, that is, for j =1,...,n, 


(mj) em? 


Pr(N; = z|© = 0) = -i 


t=, 1, 2e 

This would be the case if, per individual, the number of claims were inde- 
pendently Poisson distributed with mean 0. Determine the Bayesian expected 
number of claims for the mn+1 individuals to be insured in year n+ 1. 
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With these assumptions, the average number of claims per individual in 
year j is 


X; = —, j=Hl,...,n. 
j m; J 


Therefore, 
fx;io(z;l0) = Pr[Nj = mjx;|0 = 4]. 
Assume © is gamma distributed with parameters a and ĝ, 
2% 1-9/8 


m(0) = Taye? 


0 > 0; 


then the posterior distribution Tejx(0|x) is proportional (as a function of 6) 
to 


TI fxstol@sl®) | 70), 


which is itself proportional to 
n 
Il OMIT] gT m? 2-1 8/8 = get jar mj2j—1,—0(B +0 j= mj) 


j=l 


This is proportional to a gamma density with parameters a. = Ot iar MjLj 
and 8, = (1/8 + 03, mj)~*, and so ©|X is also gamma, but with œ and 8 
replaced by a, and ,,, respectively. 

Now, 


1 es eee ee eee ee 
Thus n+ (8) = E(Xn+1|© = 8) = ĝ and Hn+1 = E(Xn-+1) = Efun (O) = 


af because O is gamma distributed with parameters a and 8. From (16.28) 
and because O|X is also gamma distributed with parameters a, and Bu 


BUI =7) =f iOa dP 
= Elun (0)|X =x) 
= E(O|X =x) 
= a8. 


Define the total number of lives observed to be m = È; Mj. 
Then, 
E(Xn4i[X = x) = ZZ + (1 — Z) engi 


where Z = m/(m + 8™) and = = m™ Eii Mjzj, and p = af, again an 
expression of the form (16.19). 


i 
i 
| 
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The total Bayesian expected number of claims for Mmn+ı individuals in the 
group for the next year would be mn41E(Xn+1|X = x). 

The analysis based on independent and identically distributed Poisson 
claim counts is obtained with m; = 1. Then X; = N; for j = 1,2,...,n 
are independent (given @) Poisson random variables with mean @. In this case 


E(Xn4i|[X = x) = ZT + (1 — Z)p, 


where Z = n/(n + 8"), Z= n Ej- Tj, and p= af. 0 


In each of Examples 16.22 and 16.23 the Bayesian estimate was a weighted 
average of the sample mean Z and the pure premium p,,,. This is appealing 
from a credibility standpoint. Furthermore, the credibility factor Z in each 
case is an increasing function of the number of exposure units. The greater 
the amount of past data observed, the closer Z is to 1, consistent with our 
intuition. 


16.4.3 The credibility premium 


In the previous section a systematic approach was suggested for treatment 
of the past data of a particular policyholder. Ideally, rather than the pure 
premium Hp+1 = E(Xn41), one would like to charge the individual premium 
(or hypothetical mean) 4p+1(8), where @ is the (hypothetical) parameter as- 
sociated with the policyholder. Because @ is unknown, this is impossible, but 
we could instead condition on x, the past data from the policyholder. This 
leads to the Bayesian premium E(X,,+41|x). 

The major challenge with this approach is that it may be difficult to eval- 
uate the Bayesian premium. Of course, in simple examples such as in the 
previous subsection, the Bayesian premium is not difficult to evaluate numer- 
ically. But these examples can hardly be expected to capture the essential 
features of a realistic insurance scenario. More realistic models may well in- 
troduce analytic difficulties with respect to evaluation of E(Xn+1|x), whether 
one uses (16.27) or (16.28). Often, numerical integration may be required. 
There are exceptions such as Examples 16.22 and 16.23. 

We now present an alternative suggested by Bühlmann [18] in 1967. Recall 
the basic problem: We wish to use the conditional distribution fx, ,,;e(@n-+119) 
or the hypothetical mean p,,,(@) for estimation of next year’s claims. Be- 
cause we have observed x, one suggestion is to approximate y,,, (0) by a linear 
function of the past data. [After all, the formula ZX + (1 — Z) is of this 
form.] Thus, let us restrict ourselves to estimators of the form aot) j= Qj Xj, 
where a, 01,..., Œn need to be chosen. To this end, we will choose the as to 
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minimize squared error loss, that is, 


2 
Q=E ben (8) -a.- >> z (16.29) 


j=1 


and the expectation is over the joint distribution of X1,..., Xn and O. That 
is, the squared error is averaged over all possible values of © and all possible 
observations. To minimize Q, we take derivatives. Thus, 


2- afz pant @)—e0~ Soo Je of 


j=l 


We shall denote by Go, G1,..., Čn the values of a9, 01,...,@n which minimize 
(16.29). Then equating 0Q/dap to 0 yields 


Efun (O)| = Go + > &; E(X;). 


j=l 


But E(Xn41) = E[E(Xn+1|9)] =E[u,,,(©)], and so 0Q/Gaq = 0 implies that 


E(Xn41) = Go + ajE(X;). (16.30) 
j=l 


Equation (16.30) may be termed the unbiasedness equation because it 
requires that the estimate &o + DA 1 &jX; be unbiased for E(Xn+1)- How- 
ever, the credibility estimate may “be biased as an estimator of Bn+i(9). = 

E(X,4:/@), the quantity we are trying to estimate. This bias will average 
out over the members of ©. By accepting this bias we are able to reduce the 
overall mean-squared error. For i= 1,...,n, we have 


2 pon) 00 Fook l G x0] 


j=1 


and setting this equal to 0 yields 


Xi) + aE (X:X;) : 


j=l 


ElHn41(©)Xi] = &oE ( 


The left-hand side of this equation may be reexpressed as 


Efun (0)X:] E{E[Xitn41(©)|O]} 
= E{u,41(O)E[X;|0]} 
= E[E(Xn+1|0)E(X:|0)] 
= EE(X241%1|9)] 
= E(XiXn+1), 
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where the second from the last step follows by independence of X; and Xn41 
conditional on ©. Thus 0Q/0a; = 0 implies 


j=l 
Next multiply (16.30) by E(X;) and subtract from (16.31) to obtain 


Cov(Xi, Xn4i) = X Gj Cov(X;,Xj), i=1,...,n. (16.32) 
j=l 
Equation (16.30) and the n equations (16.32) together are called the normal 
equations. These equations may be solved for Go, &ı,...,Õn to yield the 
credibility premium 
n 
ão +J ã;X;. (16.33) 
While it is straightforward to express the solution Gg, &1,. . . , Čn to the normal 


equations in matrix notation (if the covariance matrix af the Xjs is non- 
singular), we shall be content with solutions for some special cases. 

Note that exactly one of the terms on the right-hand side of (16.32) is a 
variance term, that is, Cov(X;,X;) = Var(X;). The other n — 1 terms are 
true covariance terms. 


As an added bonus, the values Gp, @1,...,@,, also minimize 
2 2 
Qi. =E pea —a9- >> 5 (16.34) 
J= 
2 
nm 
Qo =E (x -%-5. aa) ; (16.35) 
j=l 


To see this, differentiate (16.34) or (16.35) with respect to ag,Q1,...,Qn 
and observe that the solutions still satisfy the normal equations (16. 30) and 
(16.32). Thus the credibility premium (16.33) is the best linear estimator of 
each of the hypothetical mean E(X,1|), the Bayesian premium E(Xn+1|X), 
and Xn41- 


Example 16.24 If E(X;) = p, Var(X;) = 07, and, fori Æ j, Cov(.Xi, Xj) = 
po", where the esprelation coefficient p satiehies —1 <p < 1, determine the 
credibility premium Qo + D ny &j Xj. 


The unbiasedness equation (16.30) yields 


n 
= Go+ ud. a; 
jal 
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or > r 
Yasi- 2. 
j=l B 

The n equations (16.32) become, fori =1,...,n, 


nr 

p= 5 ajpt a; 
j=1 
Jži 


or, stated another way, 


m 
p=) &jp+&(1—p), t=1,...,n. 
j=l 


Thus 
P (1 = as ā;) _  Põo 
“= Ip MlA 
using the unbiasedness equation. Summation over į from 1 to n yields 
n 7 n à K npõo 

2% p 28) (1 —p)’ 
which combined with the unbiasedness equation gives an equation for Gp, 
namely 


ão _  npõo 
u e-pep) 
Solving for Gp yields 
on = Gou 
ae p+np 
Thus, “ 
pao p 


“= uA T—p+np" 
The credibility premium is then 


m n 
> = _ (t-eu pXj 
ot ee ~ i—p+np tT ptmp 
= (1-Z)p+ZX, 


where Z = np/(1— p + np) and X =n“ jai X;. Thus, if 0 < p < 1, then 
0< Z <1 and the credibility premium is a weighted average of y = E(Xn41) 
and X, that is, is of the form (16.19). o 


We now turn to some models which specify the conditional means and vari- 
ances of X;|© and hence the means E(X;), variances Var(X;), and covariances 
Cov (X i; X. j ) . 
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16.4.4 The Buhlmann model 


This, the first and simplest credibility model, specifies that for each policy- 
holder (conditional on ©) past losses Xj,...,X, have the same mean and 
variance and are independent and identically distributed conditional on ©. 
Thus, define 
(8) = E(X;|0 = 6) 
and 
v(0) = Var(X;|O = 8). 
As discussed previously, j4(@) is referred to as the hypothetical mean whereas 
v(@) is called the process variance. Define 


u = Ep), (16.36) 

v = Ef[v(9)], (16.37) 
and 

a = Var[u(0)}. (16.38) 


The quantity in (16.36) is the expected value of the hypothetical 
means, v in (16.37) is the expected value of the process variance, and a 
in (16.38) is the variance of the hypothetical means. Note that u is the 
estimate to use if we have no information about @ [and thus no information 
about y(8)]. It will also be referred to as the collective premium. 

The mean, variance, and covariance of the X. js may now be obtained. First, 


E(X;) = E[E(X;|©)] = E[u(©)] = x. (16.39) 
Second, . 
Var(X;) = E[Var(X;|©)] + Var[E(X;|©)] 
E[v(©)] + Var[u(©)] 
= uta. (16.40) 


lI 


Finally, for i Æ j, 
Cov(Xi, Xj) = E(X;,X;) — E(X:)E(X;) 
= EE(X;X;|©)] - p’ 
-= E[E(X,|0)E(X;|©)] — {E[u(©)]}? 
= E{[x(©))?} — {E[u()]}? 
= Var[x(©)] 
= a. (16.41) 


This is exactly of the form of Example 16.24 with parameters po? =v+a, 
and p = a/(v-+ a). Thus the credibility premium is 


ão +X ã;X; = ZX + (1 — Z)u, (16.42) 
j=l 
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where 
n 


gaat 16.43 
n+k ( i ) 


and 

pa? BVa) 
a Var[E(X;|©)] 
The credibility factor Z in (16.43) with k given by (16.44) is referred to as the 
Bühlmann credibility factor. Note that (16.42) is of the form (16.19), and 

(16.43) is exactly (16.20). Now, however, we know how to obtain k, namely 
from (16.44). 

_ Formula (16.42) has many appealing features. First, the credibility pre- 
mium (16.42) is a weighted average of the sample mean X and the collective 
premium u, a formula which we find desirable. Furthermore, Z approaches 
1 as n increases, giving more credit to X rather than u as more past data 
accumulates, a feature which agrees with intuition. Also, if the population 
is fairly homogeneous with respect to the risk parameter O, then (relatively 
speaking) the hypothetical means u(©) = E(X;|©) do not vary greatly with 
© (i.e., they are close in value) and hence have small variability. Thus a is 
small relative to v, that is, k is large and Z is closer to 0. But this agrees with 
intuition because for a homogeneous population the overall mean p is of more 
value in helping to predict next year’s claims for a particular policyholder. 
- Conversely, for a heterogeneous population, the hypothetical means E(X;|©) 
are more variable, that is, a is large and k is small, and so Z is closer to 1. 
Again this makes sense because in a heterogeneous population the experience 
of other policyholders is of less value in predicting the future experience of a 
particular policyholder than is the past experience of that policyholder. 

We now present some examples. 


(16.44) 


Example 16.25 (Example 16.20 continued) Determine the Bühlmann esti- 
mate of E(X3|0, 1). 


From earlier work, 


u(G) = E(X,|G) =0.4, p(B) = E(X;|B) =0.7, 
a(G) = 0.75, m(B) = 0.25, 


and therefore, 


Y u(8)x(0) = 0.4(0.75) + 0.7(0.25) = 0.475, 
6 


m 


a 


X- 1(8)?x(8) — u? = 0.16(0.75) + 0.49(0.25) — 0.4757 = 0.016875. 
6 


For the process variance, 


v(G) = Var(X;|G) = 0°(0.7) + 17(0.2) + 27(0.1) — 0.4? = 0.44, 
v(B) = Var(X;|B) = 0°(0.5) + 1°(0.3) + 27(0.2) — 0.77 = 0.61, 
v = )-v(6)n(6) = 0.44(0.75) + 0.61(0.25) = 0.4825. 


6 
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Then (16.44) gives 
v 0.4825 


k= 7 = gors ~ 28-5926 
and (16.43) gives 
AE 7 
~ 2428.5926 ` ; 


The expected next value is then 0.0654(0.5) + 0.9346(0.475) = 0.4766. This 


is the best linear approximation to the Bayesian premium (given in Example 
16.20). o 


Example 16.26 Suppose as in Example 16.23 (with m; = 1) that X;|O, j = 
l,... n, are independently and identically Poisson distributed with (given) 
mean © and © is gamma distributed with parameters œ and B. Determine the 
Bühlmann premium. 


We have 


(0) = E(X;|© =0)=9, v(0) = Var(X;|© = 6) =9, 


and so 
H = Elu(©)] = E(O) = a8, v=Elv(©)] = E(O) = af, 
and l 
a = Var[p(©)| = Var(O) = aĝ’. 
Then 
a of? P ~ n+k n+1/B n+ 


and the credibility premium is 


1 
PrP ha 


2k+(1-2)u= -+ 


But, as shown at the end of Example 16.23, this is also the Bayesian estimate 


E(Xn41|X). Thus, the credibility premium equals the Bayesian estimate in 
this case. O 


Example 16.27 Determine the Bühlmann estimate for the setting in Eram- 
ple 16.22. 
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For this model, 


(©) = 9%, u= E(07) = 


1? 
(0) = 07, v=E(O = Tea) 
7 eee B B a 
a = Var(@ =a - (<4) -~ (a= 1)?(a — 2)’ 
v 
e = -=a l, 
a 
n n 
ot ntk n+ta—1’ 
ae n = a—-1 Æ$ 


nta-l = n+a—-la-l’ 
which again matches the Bayesian estimate. 

An alternative analysis for this problem could have started with a single 
observation of S = X,+---+ Xn. From the assumptions of the problem, S$ 
has a mean of nQ~! and a variance of nO~?. While it is true that S has a 
gamma distribution, that information is not needed because the Biihlmann 
approximation requires only moments. Following the above calculations, 


np np? np 

L = —, v= eS, 0 = 
a-—l (a — 1)(a — 2) (a — 1)?(a — 2) 
a-l 1 n 

o = Z = — = ——.. 

k n ’ 1+k nt+a-1 


The key is to note that in calculating Z the sample size is now 1, reflecting 
the single observation of S. Because S = nX, the Biihlmann estimate is 
n 5 a-1l ng 
P. = ————n XX + ——————_ — 
eat ahaa 

which is n times the previous answer. That is because we are now estimating 
the next value of S rather than the next value of X. However, the credibility 
factor itself (that is, Z) is the same whether we are predicting X,+1 or the 
next value of S. 


16.4.5 The Buhlmann—Straub model 


The Biihlmann model of the previous section is the simplest of the credibil- 
ity models because it effectively requires that the past claims experience of 
a policyholder comprise independent and identically distributed components 
with respect to each past year. An important practical difficulty with this 
assumption is that it does not allow for variations in exposure or size. 

For example, what if the first year’s claims experience of a policyholder 
reflected only a portion of a year due to an unusual policyholder anniversary? 
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What if a benefit change occurred part way through a policy year? For group 
insurance, what if the size of the group changed over time? 

To handle these variations, we consider the following generalization of the 
Biihlmann model. Assume that X1,...,X, are independent, conditional on 
©, with common mean (as before) 


(8) = B(X;|© = 8) 
but with conditional variances 
ule), 


my 


Var(X; 0 =6) = 


where mj is a known constant measuring exposure. Note that m; need only be 
proportional to the size of the risk. This model would be appropriate if each 
Xj were the average of mj independent (conditional on ©) random variables 
each with mean y() and variance v(@). In the above situations, m; could be 
the number of months the policy was in force in past year j, or the number 
of individuals in the group in past year j, or the amount of premium income 
for the policy in past year j. 
As in the Biihlmann model, let 


w=Elp(©)], v = Elv(O)], 
and 
a = Var|p()]. 
Then, for the unconditional moments, from (16.39) E(X;) = p, and from 
(16.41) Cov(X;, X;) = a, but 
Var(X;) = E[Var(X}|0)] + Var[B(X;10)] 


= E [22] + Var[(0)] 


To obtain the credibility premium (16.33), we will solve the normal equa- 
tions (16.30) and (16.32) to obtain Gp, &1, - . - , Čn. For notational convenience, 
define 


m = mı +m +: + Mn 


to be the total exposure. Then using (16.39) the unbiasedness equation (16.30) 
becomes T 
L= ão + D OL LL 
j=l 
which implies 


ee =1-— (16.45) 


j=l 
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For i =1,...,n, (16.32) becomes 


= a æ : 
Qi = gri i— Yä; = gy D i= 1, wee MN. (16.46) 


1-2- Gy = à = = Sm, = OO, 
ho} j=1 i=1 a 
and so 
ae L _ v/a 
PAE l+am/v m+v/a 
But this means that 7 
_ 220 eee NG 
7 pv 7 m+o/a 
The credibility premium (16.33) becomes 
Go + X aX; = ZX + (1 — Z)p, (16.47) 
j=l 
where with k = v/a from (16.44) 
m 
Z = —— 
m+k 
and 
zZ = Mj 
= —=xX;. 16.48 
ž=}, m Ai (16.48) 


Clearly, the credibility premium (16.47) is still of the form (16.19). In this case, 
m is the total exposure associated with the policyholder, and the Bühlmann- 
Straub credibility factor Z depends on m. Furthermore, X is a weighted 
average of the Xj, with weights proportional to mj. Following the group 
interpretation, X; is the average loss of the m; group members in year j 
and so m;X,j is the total loss of the group in year j. Then X is the overall 
average loss per group member over the n years. The credibility premium to 
be charged to the group in year n + 1 would thus be mn4i[ZX + (1 — Z) py] 
for Mn+1 members in the next year. 
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Had we known that (16.48) would be the correct weighting of the X; to 
receive the credibility weight Z, the rest would have been easy. For the angie 
observation X the process variance is 


Var(X|0) = ae 7 v(0) _ 2(0) 


mMm? mj m 


and so the expected process variance is v/m. The variance of the hypothetical 
means is still a and therefore k = v/(am). There is only one observation of 
X and so the credibility factor is 


1 m 


a o (16.49) 


as before. Equation (16.48) should not have been surprising because the 
weights are simply inversely proportional to the (conditional) variance of each 
Xj. 


Example 16.28 As in Example 16.23, assume that in year j there are N; 
claims from m; policies, j =1,...,n. An individual policy has the Poisson 
distribution with parameter © and the parameter itself has the gamma distri- 
bution with parameters a and B. Determine the Biihlmann—Straub estimate 
of the number of claims in year n +1 if there will be mn+1 policies. 


In order to meet the conditions of this model, let X; = N;/mj;. Because 
N; has the Poisson distribution with mean m,0, By Joj- © = p(O) and 
Var(X; l9) = O/m; = v(O)/m;. Then, 


u = E(O)=a8, a=Var(O)=af?, v=E(O) =af, 
— 1 paa h A m8 
~ BE © mtl mB+1’ 


and the estimate for one policyholder is 


mp =z 1 
marl mee 


c= ap, 


where X = m`! I. ;—1 MjXj. For year n+ 1, the estimate is mn+1 Po, match- 
ing the answer to Example 16.23. , m 


The assumptions underlying the Bühlmann-Straub model may be too re- 
strictive to represent reality. In a 1967 paper, Hewitt [55] observed that large 
risks do not behave the same as an independent aggregation of small risks 
and, in fact, are more variable than would be indicated by independence. A 
model that reflects this observation is created in the following example. 
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Example 16.29 Let the conditional mean be E(X;|©) = (©) and the-con- 
ditional variance be Var(X;|©) = w(©) + v(©)/m;. Further assume that 
X1,...,Xpn are conditionally independent given ©. Show that this model’ sup- 
ports Hewitt’s observation and determine the credibility premium. 


Consider independent risks i and j with exposures m; and m; and with a 
common value of ©. When aggregated, the variance of the average loss is 


2 
Var (mms a) s (G ) Var(X;l©) 
Mi + Mj Mi + Mj 
2 
Mj l 
(i) Yee) 
_ m+n% 1 
= a ae mP w(O) + TET u(O) 


while a single risk with exposure m;+m, has variance w(@)+v(©)/(mi+m,), 
which is larger. 
With regard to the credibility premium, we have 


- E(X;) E[E(X;|9)] = E[u(©)] = x 
Var(Xj) = E[Var(X;|©)] + Var[E(X;|©)] 


= E heo) + a + Var[a(9)] 


J 


|| 


v 
w+— +a, 
my 


and for i # j, Cov(X;, X;) = a as in (16.41). The unbiasedness equation is 
still 


n 
w=Go+ > õju 
=1 


and so 


Equation (16.32) becomes 


n 
Saja +ã; (w+ >) 
Mi 


j=l 
= a(1-2) +a: (w+), es 
\ H Mi 


į ado / py 
a; = ——_. 
w+ u/mi 


2 
Il 


Therefore, 
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Summing both sides yields 


n MA [i 
a Qo 
Sa = œj = 1- — 

H j=l u + wm; j=l H ; 
and so | 
i = 1 H } 

seat Oia Mj 1 ~ Ipan | 
ie j=lu+wm; p 
where 
= m 
* J 
sg 2 v+wm,; 
j=l 
Then 


2 am; 1 
a5 = —— m. 
v+ wmi 1+ am* 


The credibility premium is 
E ere elle ey 
l+am* 1-+am* ja UF wm; 


The sum can be made to define a weighted average of the observations by 
letting 


n M; 
Lee n 
j=l v + WM j. = 1 5 Mj X 
5s Mj m* c v 4 wm; * lod 
j=l v + wmi 5 


X= 


If we now set 
am* 


~ T-am*’ 
the credibility premium is 


ZX +(1—Z)p. 


Observe what happens as the exposures mj go to infinity. The credibility 
factor becomes 
an/w 


———— <1]. 
1+an/w 
Contrast this to the Biihlmann-Straub model where the limit is 1. Thus, 


no matter how large the risk, there is a limit to its credibility. A further 
generalization of this result is provided in Exercise 16.26. o 


Z — 


Another generalization is provided by letting the variance of (O) depend 
on the exposure. This may be reasonable if we believe that the extent to which 
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a given risk’s propensity to produce claims that differ from the mean is related 
to its size. For example, larger risks may be underwritten more carefully. In 
this case, extreme variations from the mean are less likely because we ensure 
that the risk not only meets the underwriting requirements but also appears 
to be exactly what it claims to be. 


Example 16.30 (Example 16.29 continued) In addition to the specification 
presented in Example 16.29, let Var[u(©)] = a+ b/m, where m = X; Mj 
is the total exposure for the group. Develop the credibility formula. 


We now have 


E(X;) = E[E(X;|©)] = E[u(©)] =p 
Var(X;) = E[Var(X;|©)] + Var[E(X;|©)] 
= E [uo + œ] + Var[e(©)] 

= wt a Per 
and for i Æ j 
Cov(Xi,X;) = E[E(X:X;|©)] — p’ 
E[u(©)7] — x? 
= at m 


It can be seen that all the calculations used in Example 16.29 apply here 
with a replaced by a + b/m. The credibility factor is 


(a + b/m)m* 
Z = l 
1+ (a+b/m)m* 
and the credibility premium is 
ZX+(1—-Z)p 


with X and m* defined as in Example 16.29. This particular credibility for- 
mula has been used in workers compensation experience rating. One example 
of this is presented in detail in [45]. Oo 


16.4.6 Exact credibility 


In Examples 16.26-16.28 we found that the credibility premium and the 
Bayesian premium were equal. From (16.34), one may view the credibility 
premium as the best linear approximation to the Bayesian premium in the 
sense of squared error loss. In these examples the approximation is exact 
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because the two premiums are equal. The term exact credibility is used 
to describe the situation when the credibility premium equals the Bayesian 
premium. 

In fact, it is not hard to see that one can ascertain whether credibility 
is exact without even calculating the credibility premium. If the Bayesian 
premium is a linear function of Xj,...,Xn, 


E(Xn4i[%) = ao + Sa; Xj, 
j=l 


then it is clear that in (16.34) the quantity Qı attains its minimum value 
of zero with @; = a; for j = 0,1,...,n. Thus the credibility premium is 
Gig + a1 Gj Xj = ao + Fy aj Xj = E(Xp41[X) and credibility is exact. 

This phenomenon occurs fairly generally in connection with linear exponen- 
tial family members (Section 12.4.3) and their conjugate priors. We parame- 
terize such that X;|O = @ is independently (conditional on © = 8) distributed 
with pf for j =1,...,n+1, 


p(x;)e~*s 


fx;jo(2|8) = Eoi 


and © has pdf 
—k „—ukð 
(0) = BA, 
e(u, k) 
where —oo < fo < 4; < oo. It is also assumed that m (8o) = 7 (81) = 0. For 
the moment, 4 and k are simply parameters of 7(@). We will now demonstrate 


that the choice of symbols was no coincidence. 
In Section 12.4.3 it was shown that 


89 < 0 < 61, (16.50) 


-pxo = 6) = LO) 


We wish to find E[u(©)]. From (16.50), 
Inz(0) = —kIng(6) — wké — Inc(p, k) 
and differentiating with respect to @ gives 
m(@) __kg'(0) 


(0) a0) 
In other words, 
T (0) = k[w(8) — uja (8) (16.51) 


and integrating from ĝo to 4; gives 


Oy Oy 
Oj nO f u(0yz(0) dO — ku J (0) dO. 


Q 


| 
l 
| 
| 
| 
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This implies that 0 = kE[u(©)| — ku, or equivalently, 
E[u(©)] = x. (16.52) 
Now consider the posterior distribution 79;x(6|x). It is proportional to 
nr 
TI fxio(sl®) | (9), 
j=l 


itself proportional to 


tds e798; Lire gs 
Ie [a(0)|-*e- 4 


j=l 

== [a(0)] 0t e7 Auk nz) 

= (g(a)? (16.53) 
where 

k =n+k 
and 
_pk+nE n E+ k 
H= pfn nfk nyk” 


Observe that (16.53) is proportional to a density of the form (16.50) with u 

and k replaced by u, aud ką, respectively. Hence 

[a0] E e` Ex k0 
C(Hy, ka) ? 


From (16.28) and using the same development that led to (16.52), the Bayesian 
premium is 


Teojx(ðlx) = bo < 8 < A. 


84 
BXmub) = | Moro d 
0 
= ps 
= Zf£4+(1-Z)p, 


where Z = n/(n + k). This is of the form (16.19), and because it is a linear 
function of the z;s, credibility must be exact, that is, the credibility premium 
is 


Go +) Xj = ZX + (1— Z) = E(Xngil)- 
j=1 


Because the Xj|© are also identically distributed for j = 1,...,n, the Biihlmann 


model applies and so (16.42) also applies; that is, k must also satisfy (16.44). 
To see this directly, recall from Section 12.4.3 that 


v(0) = Var(X;|O = 6) = —n'(8). 
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Differentiation of (16.51) yields 
m8) = kul (0)n(8) + kèlu) — pl?2n(6) 
= —ku(6)(8) + k[y(8) — p]?7 (0). 
Integration with respect to 0 from ĝo to 8, yields 
m'(81) — 7 (80) —kElv(©)] + E{[n(©) — u]*} 
= —kv+ ka 


because (©) has mean yw and E{[u(©) — u]?} = Var[u(©)] = a. If r (01) = 
x’ (fo) = 0, this implies that k = v/a, and so (16.44) is satisfied. 


16.4.7 Linear versus Bayesian versus no credibility 


In Section 16.4.3 it was demonstrated that the credibility premium is the best 
linear estimator in the sense of minimizing the expected squared error with 
respect to the next observation, Xn+1. In Exercise 16.59 you are asked to 
demonstrate that the Bayesian premium is the best estimator with no restric- 
tions, in the same least squares sense. It was also demonstrated in Section 
16.4.3 that the credibility premium is the linear estimator that is closest to 
the Bayesian estimator, again in the mean squared error sense. Finally, we 
have seen that in a number of cases the credibility and Bayesian premiums 
are the same. This leaves two questions. Is the additional error caused by 
using the credibility premium in place of the Bayesian premium worth wor- 
rying about? Is it worthwhile to go through the bother of using credibility in 
the first place? While the exact answer to these questions depends on the un- 
derlying distributions, we can obtain some feel for the answers by considering 
two examples. 

We begin with the second question and use a common situation that has 
already been discussed. What makes credibility work is that we expect to 
perform numerous estimations. As a result, we are willing to be biased in 
any one estimation provided that the biases cancel out over the numerous 
estimations. This allows us to reduce variability and, therefore, squared error. 
The following example shows the power of credibility in this setting. 


Example 16.31 Suppose there are 50 occasions on which we obtain a random 
sample of size 10 from a Poisson distribution with unknown mean. The sam- 
ples are from different Poisson populations and therefore may involve different 
means. Let the true means be 01,...,959. Further assume that the Poisson 
parameters are drawn from a gamma distribution with parameters a: = 50 and 
8 = 0.1. Compare the mazimum likelihood estimates ner j =1,...,50, to 
the credibility estimates Cj = (X; +5)/2. Note that this is the Biihlmann 
credibility estimate. 


We will first analyze the two estimates by determining their respective 
mean-squared errors. Using the sample mean the total squared error is, where 
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= (Q,,..., O50), 


50 
= >(%; -9;}, 


j=l 


and the mean-squared error is 


50 o, 
p [So Var(%5l@,) =E ae = 25. 


j=l i= 


E(51) = E[E(Si|©)] = 


Using the credibility estimator, the squared error is 


50 
S2 = $ (0.58; + 2.5 — @;)? 


j=l 
and the mean-squared error is 
E(52) = E[E(S2|®)] 


50 a _ 
| E(0.25X} + 6.25 + OF + 2.5K; — 50; — X;0;|9;) 
j=l 


50 
E > [os (2 +03) +6.25+ 6} +250; — 50; -e| 


j=l 


50 

X [0.25(0.5 + 25.5) + 6.25 + 25.5 + 2.5(5) — 5(5) — 25.5] 
j=l 

= 12.5. 


Il 


Of course, we “cheated” a bit. We used squared error as our criterion and 
so knew in advance that the Bühlmann estimate would have the smaller value 
given that it is competing against another linear estimator. The interesting 
part is the significant improvement that resulted. This means that, even if the 
components of the credibility formula Z and p were not set at their optimal 
values, the credibility formula is still likely to result in an improvement. 

To get a feel for how this improvement comes about, consider a specific set 
of 50 values of @;. The ones presented in Table 16.3 are a random sample from 
the prior gamma distribution sorted in increasing order. The next column 
provides the mean-squared error of the sample mean (@;/10). The final three 
columns provide the bias, variance, and mean-squared error for the credibility 
estimator based on Z = 0.5 and u = 5. The sample mean is always unbiased 
and therefore the variance matches the mean-squared error and so these two 
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Table 16.3 A comparison of the sample mean and the credibility estimator 


xX O.5X +2.5 xX O.5X +2.5 
8 MSE Bias Var. MSE 9 MSE ” Bias Var. MSE 


3.510 .351  .745 088 .643 4.875 .488 062 .122 .126 
3.637 .364 .681 .091 055 4.894 .489 .053  .122 .125 
3.742 .374  .629 094 .489 4.900 .490 050 .123 .125 
3.764 .376  .618 094 .476 4.943 .494 .028 = .124 124 
3.793 379  .604 = .095 459 4.977 498 12 = .124 125 
4.000 400 .500 .100 350 5.002 500 —-.001 .125 125 
4.151 Al5 424 = .104 284 5.013 501 —.006 .125 125 
4.153 .415 -424 = .104 .283 5.108 511  —.054 .128 131 
4.291 .429 .354 £107 .233 5.172 O17  —.086 .129 137 
4.405 440 .298 .110 199 5.198 520  —.099 .130 140 
4.410 -441 .295 .110 .197 5.231 -523 —.116 .131 .144 
4.413 .441 .293 .110 -196 5.239 524 —.120 .131 -145 
4.430 .443 .285 111 192 5.263 526 —.132 .132 149 
4.438 444 281 111 -190 5.300 530  —.150 .132 155 
4.471 AAT  .264 112 .182 5.338 .534 —.169 .133 .162 
4.491 .449 .254 112 77 5.400 .540 —.200 .135 175 
4.495 .449 253.112 176 5.407 41 —.203 .135 176 
4.505 451 247 = 113 174 = 5.431 -543 —.215 .136 182 
4.547 455 .227 114 165 5.459 546 —.229 .136 -189 
4.606 461 A97 = 115 154 5.510 51 —.255  .138 203 
4.654 465 173  .116 146 5.538 554 ~—.269 .138 211 
4.758 476.121 119 134 5.646 565 —.323 141 | .246 
4.763 A476 .118 119 .133 (5.837 584 —.419 .146 321 
4.766 AT7 117 119 .133 5.937 594 —.468 148 368 
4.796 A480  .102 120 130 6.263 626 -—.631 .157 555 


Mean 482 O91 = .120 222 


quantities are not presented. For the credibility estimator, 


Bias = E(0.5X; + 2.5 —0;) = 2.5 — 0.56, 
0.250; 
1 


Variance = Var(0.5X; +2.5) = = 0.0256;, 


Mean-squared error = bias” + variance = 0.250; — 2.4750; + 6.25. 


We see that, as expected, the average mean-squared error is much lower for 
the credibility estimator, and this is achieved by allowing for some bias in the 
individual estimators. Further note that the credibility estimator is at its best 
near the mean of the prior distribution (5). O 


We have seen that there is real value in using credibility. Our next task 
is to compare the linear credibility estimator to the Bayesian estimator. In 
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most examples, this is difficult because the Bayesian estimates must be ob- 
tained by approximate integration. An alternative would be to explore the 
mean squared errors by simulation. This approach is taken in an illustration 
presented in Foundations of Casualty Actuarial Science [24], p. 467. In the 
following example we use the same illustration but employ an approximation 
that avoids approximate integration. It should also be noted that the linear 
credibility approach requires only assumptions or estimation of the first two 
moments while the Bayesian approach requires the distributions to be com- 
pletely specified. This nonparametric feature makes the linear approach more 
robust, which may compensate for any loss of accuracy. 


Example 16.32 Individual observations are samples of size 25 from an in- 
verse gamma distribution with a = 4 and unknown scale parameter ©. The 
prior distribution for © is gamma with mean 50 and variance 5,000. Compare 
the linear credibility and Bayesian estimators. 


For the Biihlmann linear credibility estimator we have 


BM] =8($) = 


H care 
(©) 5,000 
= NaON (3) = aan 
©? 5,000 +50? 7,500 
o a 
and so T 25 7 100 
i e 7,500/18 — 103 
5,000/9 


and the credibility estimator is erea = (L00X + 50)/103. 
For the Bayesian estimator, the posterior density is 


25 av} 
x 69957 H(0.014 0721 257) 


-1 
which is a gamma density with parameters 100.5 and (0. 01+ ye 1 25°) : 


The posterior mean is 
100.5 33.5 


— z adso fh = —— a [I 
0.01 + 32 2127" Pa 0.01 + Dje 2 a; 


03 ayes = 


which is clearly a nonlinear estimator. 
With regard to accuracy, we can also consider the sample mean. Given the 
value of 9, the sample mean is unbiased with variance and mean squared error 
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8? /(18 x 25) = 67/450. For the credibility estimator the bias is 


100X 50 @ 
. A = p/a 50 _ e 
biasg (ficrea) ( 103 + 103 =) 
_ 1008 f 50 4 
~ 309 ` 103 3 
24 50 _ 2 
~ 103 108’ 
the variance is a3 
N 100/103)-8 
Varo (Berea) axe ee 
and the mean-squared error is 
1 10,4506” 
MSEe (it = —- |2 — — 
o (Âerea) 1032 ( ,000 — 1000 + 750 ) 


For the Bayes estimate we observe that, given 8, 1 pes has a gamma dis- 
tribution with parameters 4 and 1/8. Therefore, L; jaa Xj 1 has a gamma 
distribution with parameters 100 and 1/0. We note that i in the denominator 
of figayes, the term 0.01 will usually be small relative to the sum. An ap- 
proximation can be created by ignoring this term, in which case ÎBayes has 


approximately an inverse gamma distribution with Paetas 100 and 33.58. 
Then 


33.56 0 0.50 


Biase (dipayes) Spor 3 snr ts 
. 33.520? 
Varo (itp ayes ) 992(98) o] 
: 33.52 + 49/2 
MSEo(fipayes) = D FOL pe 0.0011939167. 


992(98) 


If we compare the coefficients of 9? in the MSE for the three estimators, 
we see that they are 0.00222 for the sample mean, 0.00219 for the credibil- 
ity estimator, and 0.00119 for the Bayesian estimator. Thus for large 0 the 
credibility estimator is not much of an improvement over the sample mean, 
but the Bayesian estimator cuts the mean squared error about in half. Calcu- 
lated values of these quantities for various percentiles from the gamma prior 
distribution appear in Table 16.4. o 


The inferior behavior of the credibility estimator when compared with the 
Bayes estimator is due to the heavy tails of the two distributions. One way 
to lighten the tail is to work with the logarithm of the data. This idea was 
proposed in Foundations of Casualty Actuarial Science [24] and evaluated for 
the above example. The idea is to work with the logarithms of the data and 
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Table 16.4 A comparison of the sample mean, credibility, and Bayes estimators 


X Pored ÊBayes 3 
Percentile 0 MSE Bias MSE Bias MSE 
1 0.008 0.000 0.485 0.236 0.000 0.000 
5 0.197 0.000 0.484 0.234 0.001 0.000 
10 0.790 0.001 0.478 0.230 0.004 0.001 
25 5.077 0.057 0.436 0.244 0.026 0.031 
50 22.747 1.150 0.265 1.154 0.115 0.618 
T5 66.165 9.729 —0.157 9.195 0.334 5.227 
90 135.277 40.667 —0.828 39.018 0.683 21.849 
95 192.072 81.982 —1.379 79.178 0.970 44.046 
99 331.746 244.568 —2.735 238.011 1.675 131.397 


use linear credibility to estimate the mean of the distribution of logarithms. 
The result is then exponentiated. Because this procedure is sure to introduce 
bias a multiplicative adjustment is made. The results are presented in the 
following example with many of the details left for Exercise 16.57. 


Example 16.33 (Example 16.32 continued) Obtain the log-credability esti- 
mator and evaluate its bias and mean-squared error. 


Let W; = ln X;. Then for the credibility on the logarithms 


(0) 


E(W|0) 
= Í (Inz)O*2-5e-9/* 2 de 


= i (nO —ny)y e4 dy 
= mO- (4), 


where the second integral was obtained using the substitution y = ©/x. The 
last line follows from observing that the term y°e~¥/6 is a gamma density 
and thus integrates to 1 while the second term is the digamma function (see 
Exercise 16.57) and using tables in [3] we have Ų(4) = 1.25612. The next 


5By Jensen’s inequality, E[In X] < mE(X), and therefore this procedure will underestimate 
the true value. 
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required quantity is 


v(O) 


E(W?]0) — (0)? 
J (In)?O@4x-%e-9/22 do — [nO — W(4)]? 


[ mo- myye ay- fin -w 
= v4, 


where ©’(4) = 0.283823 is the trigamma function (see Exercise 16.57). Then 


u = Efn0-— v(4)] 
ae z ing g705 —80/100 —0.5 1 = 
[ (In 0)9~°*e 100 TOS dð — T (4) 
= 1 
= lni Ina te = 
f° 00+ In AJA? eT dA- WEA) 


= In100+ (0.5) — U(4) = 1.38554. 
Also, 
= Ef[w’(4)] = V (4) = 0.283823 
= Var[lnO — U(4)] l 


= wW’'(0.5) = 4.934802 


25 ; 
25 + 


4.934802 
The log-credibility estimate is 


[og-cred = cexp(0.997705W + 0.00318024). 
The value of c is obtained by setting 


E(X) = 22 = cElexp(0.997705W + 0.00318024)| 


25 
0.997705 
0.00318024 
ce CE E (a X x 


j=1 


25 
ce0-00318024 7 e (i erm) s 


j=1 


Given ©, the Xjs are independent and so the expected product is the product 
of the expected values. From Appendix A, the kth moment of the inverse 
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gamma distribution produces 


25 
50 ia oborri 0.997705 
Te get 0828024 [30 5/235p (| 4 — Se 


25 
ce0-00318024 [ap (4 _ 0-997705 100°-9977057-(9.5 + 0.997705) 
S 25 T (0.5) 


which produces c = 1.169318 and 
ihog-crea = 1-173043(2.712051)™. 


In order to evaluate the bias and mean-squared error for a given value of 
OQ, we must obtain 


E({iog-creal® = 9) 1.173043E (e7 In 2.712051 (6 = 0) 


j=l 


25 
= 1.173043E th A guano Lice | 5 J 


25 
1.173043 Cee (4 irs me Il 


25 
and 
E(B gcreal® = 0) = 1.173043E (en 1n2.71205116 — 6) 


21n 2.712051 \ |” 
25 i 


1.173043? 40° ln 2.712051)/25Ņp (4 


The measures of quality are then 


Biasg Ver eea) = E(fjog-crea|© = b) = 46, 
MSEg (Biog-craa’) = E(itog-crealO = 0) it FE hiseated |o = 0)? 
+ [biase (iog-cred J]? : 


Values of these quantities are calculated for various values of 0 in Table 16.5. 
A comparison with Table 16.4 indicates that the log-credibility estimator is 
almost as good as the Bayes estimator. o 


In practice, log-credibility is as easy to use as ordinary credibility. In either 
case, one of the computational methods of the next section would be used. 
For log-credibility, the logarithms of the observations are substituted for the 
observed values and then the final estimate is exponentiated. The bias is 
corrected by multiplying all the estimates by a constant such that the sample 
mean of the estimates matches the sample mean of the original data. 
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Table 16.5 Bias and mean squared error for the log-credibility estimator 


Percentile 0 Bias MSE 
1 0.008 0.000 0.000 
5 0.197 0.001 0.000 
10 0.790 0.003 0.001 
25 5.077 0.012 0.034 
50 22.747 0.026 0.666 
75 66.165 0.023 5.604 
90 135.277 —0.028 23.346 
95 192.072 —0.091 46.995 
99 331.746 —0.295 139.908 


16.4.8 Notes and References 


In this section, one of the two major criticisms of limited fluctuation credibility 
has been addressed. Through the use of the variance of the hypothetical 
means, we now have a means of relating the mean of the group of interest, 
u(0), to the manual, or collective, premium, u. The development was also 
mathematically sound in that the results followed directly from a specific 
model and objective. We have also seen that the additional restriction of a 
linear solution was not as bad as it might be in that often we still obtain 
the exact Bayesian solution. There has subsequently been a great deal of 
effort expended to generalize the model. With a sound basis for obtaining a 
credibility premium, we have but one remaining obstacle: how to numerically 
estimate the quantities a and v in the Bühlmann formulation, or how to 
specify the prior distribution in the Bayesian formulation. Those matters are 
addressed in the final section of this chapter. 

A historical review of credibility theory including a description of the lim- 
ited fluctuation and greatest accuracy approaches is provided by Norberg 
[100]. Since the classic paper of Biihlmann [18], there has developed a vast 
literature on credibility theory in the actuarial literature. Other elementary 
introductions are given by Herzog [52] and Waters [135]. Other more advanced 
treatments are Goovaerts and Hoogstad [46] and Sundt [127]. An important 
generalization of the Biihlmann-Straub model is the Hachemeister [48] re- 
gression model, which was not discussed here. See also Klugman [76]. The 
material on exact credibility is taken from Jewell [66]. See also Ericson [34]. 
A special issue of Insurance: Abstracts and Reviews (Sundt [126]) contains an 
extensive list of papers on credibility. 
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16.4.9 Exercises 


16.22 Consider a die-spinner model. The first die has one “marked” . face 
and five “unmarked” faces whereas the second die has four “marked” faces 
and two “unmarked” faces. There are three spinners, each with five equally 
spaced sectors marked 3 or 8. The first spinner has one sector marked 3 and 
four marked 8, the second has two marked 3 and three marked 8, and the third 
has four marked 3 and one marked 8. One die and one spinner are selected 
at random. If rolling the die produces an unmarked face, no claim occurs. If 
a marked face occurs, there is a claim and then the spinner is spun once to 
determine the amount of the claim. 


(a) Determine 7(@) for each of the six die-spinner combinations. 


(b) Determine the conditional distributions fxjo(z|@) for the claim 
sizes for each die~spinner combination. 


(c) Determine the hypothetical means (0) and the process variances 
v(@) for each 8. 

(d) Determine the marginal probability that the claim X; on the first 
iteration equals 3. 

(e) Determine the posterior distribution Tex, (4|3) of O using Bayes’ 
theorem. 

(£) Use (16.25) to determine the conditional distribution fx,)x, (£23) 
of the claims Xz on the second iteration given that X; = 3 was 
observed on the first iteration. 

(g) Use (16.28) to determine the Bayesian premium E(X2|X, = 3). 

(h) Determine the joint probability that Xə = zo and X; = 3 for 
Lo = 0, 3, 8. 

(i) Determine the conditional distribution fx,)x,(£2|3) directly using 
(16.23) and compare your answer to that of (f). 

(j) Determine the Bayesian premium directly using (16.27) and com- 
pare your answer to that of (g). 


(k) Determine the structural parameters u,v, and a. 
(1) Compute the Biihlmann credibility factor and the Bühlmann cred- 


ibility premium to approximate the Bayesian premium E(X2|X1 = 
3). 


16.23 Three urns have balls marked 0, 1, and 2 in the proportions given in 
Table 16.6. An urn is selected at random, and two balls are drawn from that 
urn with replacement. A total of 2 on the two balls is observed. Two more 
balls are then drawn with replacement from the same urn, and it is of interest 
to predict the total on these next two balls. 


(a) Determine 7(6). 
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Table 16.6 Data for Exercise 16.23 


Urn Os 1s 2s 
1 0.40 0.35 0.25 
2 0.25 0.10 0.65 
3 0.50 0.15 0.35 


(b) Determine the conditional distributions fxje(z|8) for the totals on 
the two balls for each urn. 


(c) Determine the hypothetical means (9) and the process variances 
u(@) for each 8. 


(d) Determine the marginal probability that the total X, on the first 
two balls equals 2. 


(e) Determine the posterior distribution ox, (8|2) using Bayes’ the- 
orem. 


(f) Use (16.25) to determine the conditional distribution fx, |X; (z212) 


of the total X2 on the next two balls drawn given that X; = 2 was 
observed on the first two draws. 


(g) Use (16.28) to determine the Bayesian premium E(X2|X1 = 2). 


(h) Determine the joint probability that the total Xə on the next two 
balls equals z2 and the total X; on the first two balls equals 2 for 
T2 = 0, 1,2,3,4. 


(i) Determine the conditional distribution fx x, (z2|2) directly using 
(16.23) and compare your answer to that of (f). 


(j) Determine the Bayesian premium directly using (16.27) and com- 
pare your answer to that of (g). 


(k) Determine the structural parameters u,v, and a. 


(1) Determine the Bihlmann credibility factor and the Bühlmann cred- 
ibility premium. 
(m) Show that the Bühlmann credibility factor is the same if each “ex- 


posure unit” consists of one draw from the urn rather than two 
draws. 


16.24 Suppose that there are two types of policyholder: type A and type B. 
Two-thirds of the total number of the policyholders are of type A and one- 
third are of type B. For each type, the information on annual claim numbers 
and severity are given as follows: 

A policyholder has a total claim amount of 500 in the last four years. 
Determine the credibility factor Z and the credibility premium for next year 
for this policyholder. 
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Number of clais Severity 16.27 For the situation described in Exercise 12.72 determine (0) and the 
Type s oe cas yaa Bayesian premium E(Xn+1|x). Why is the Bayesian premium equal to the 
a E ee A a E a a ce A credibility premium? 
A 0.2 0.2 200 4,000 
B 0.7 0.3 100 1,500 16.28 For the situation described in Exercise 12.73 determine (0) and the 


Bayesian premium E(X,,41|x) and verify directly that the credibility premium 
equals the Bayesian premium. 

16.25 Let ©, represent the risk factor for claim numbers and let Oz represent 
the risk factor for the claim severity for a line of insurance. Suppose that O1 
and ©% are independent. Suppose also that given ©; = 4; the claim number 
N is Poisson distributed and given O = 62 the severity Y is exponentially 
distributed. The expectations of the hypothetical means and process variances 
for the claim number and severity as well as the variance of the hypothetical 
means for frequency are respectively 


16.29 For the situation described in Exercise 12.74 determine (0) and the 
Bayesian premium E(X,,41|x) and verify directly that the credibility premium 
equals the Bayesian premium. 


16.30 Consider the generalization of the linear exponential family given by 


p(m, ge mee 
[a(@)|" 


If m is a parameter, this is called the exponential dispersion family. In Ex- 
ercise 12.79 it was shown that the mean of this random variable is —g’(9)/q(0). 
For this exercise, assume that m is known. 


uy = 0.1, vn = 0.1, ay = 0.05, f(z; @,m) = 
bey = 100, vy = 25,000. 
Three observations are made on a particular policyholder and we observe total 
claims of 200. Determine the Bühlmann credibility factor and the Bühlmann 
premium for this: policyholder. 
- 16.26 Suppose that X,,...,Xn are independent (conditional on ©) and that (a) Consider the prior distribution 
E(X;|©) = B,(@) and Var(Xj|©) = 7;(0) +¥;v(©), F=1,--47. 


Let 


_ [a(@)|-* exp(—Ouk) 
ae c(11, k) 


Determine the Bayesian premium. 


ĝo < 8 <6; with n(o) = w(61). 
p =E), v= EWO), T; = Elrj(©)], a= Var[u(©)].- 
(a) Show that 
E(X;) =6,ju, Var(X;) =T; + jv + pja, 


(b) Using the same prior, determine the Bühlmann premium. 


(c) Show that the inverse Gaussian distribution is a member of the 


sna exponential dispersion family. 


Cov(Xi, X;) = Bibja, i Fj. 
(b) Solve the normal equations for &o, @1,..., Gn to show that the cred- 
ibility premium satisfies 


16.31 Suppose that X; ..., Xn are independent (conditional on ©) and 


E(X;|O0) =77u(©) and Var(X;|0)=——,,_ j =1,...,n. 


T’Iu(O) 
Mj : 


n 
Qo + So aX; = (1 = Z) E(Xn41) $ Zbn+1 X: 


j=1 Let u = E[la(9)], Y E[v(©)], a= Var[u(©)], k= v/a, and m = mj+-+:+Mn. 
where (a) Discuss when these assumptions may be appropriate. 
Mj == B5(T3 + wv); j = 1, see My (b) Show that 
m = mitt Mhn, as ; 37 
Z = am(1 +am)™}, ( j) =T H, Var(X;) = 1 (a+v/m,;), 
rae £ mj Xj and 


mm B; Cov(X; Xj) =r a, i437. 
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(c) Solve the normal equations for &o, @1,..-,@n to show that the cred- 


ibility premium satisfies 
mn 
Qo + >? Qj X j 
jal 


(d) Give a verbal interpretation of the formula in (c). 


Mj rtl- y; 
ae 7 


(e) Suppose that 

p(aj, Mmj, T)e T 238 

[a0 
Show that PAG 9) = = Í u(O) and that Var(X;|O) = T”v(O)/m;, 
where u(0) = —-% Ing(0) and v(0) = —p'(@). 

(£) Prove that credibility i is exact if © has pdf 

a(o) te 
c(m k) ° 


which satisfies 7(@9) = 7(01) = 0. 


fx;jo(a38) = 


o <8 <, 


qz(0) = 


16.32 Suppose that given © = @ the random variables X1,--- , Xn are inde- 


pendent with Poisson pf 
gie? 
fx;je (23/8) = a 
(a) Let S= X,+---+Xn. Show that S has pf 


zj =0,1,2,... . 


fs(s) = a Corg z(6)dð, s=0,1,2,..., 


where © has pdf 7z (0). 
(b) Show that the Bayesian premium is 


s+1fs(s+1) 


n 
where s = ) j=; Tj- 


(c) Evaluate the distribution of S in (a) when 7(6) is a gamma distri- 
bution. What type of distribution is this? 


16.33 Suppose X;|O is normally distributed with mean © and variance v for 
j = 1,2,...,n+1. Further suppose © is normally distributed with mean p 
and variance a. Thus, 


1 
Frso(asl0) = Or exp |5- 0P], -00 <j < o, 
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and 
m(0) = (2ra) 71? exp x — n , 00 < l < %. 


Determine the posterior distribution of O|X and the predictive distribution 
of Xn+1|X. Then determine the Bayesian estimate of E(Xn+1|X). Finally, 
show that the Bayesian and Bühlmann estimates are equal. 


16.34 (*) Your friend selected at random one of two urns and then she pulled 
a ball with number 4 on it from the urn. Then she replaced the ball in the urn. 
One of the urns contains four balls, numbered 1-4. The other urn contains six 
balls, numbered 1-6. Your friend will make another random selection from 
the same urn. 


(a) Estimate the expected value of the number on the next ball using 
the Bayesian method. 


(b) Estimate the expected number on the next ball using Bühlmann 
credibility. 


16.35 The number of claims for a randomly selected insured has the Poisson 
distribution with parameter 0. The parameter @ is distributed across the 
population with pdf 7(@) = 3074, @ > 1. For an individual, the parameter 
does not change over time. A particular insured experienced a total of 20 
claims in the previous two years. 


(a) (*) Determine the Biihlmann credibility estimate for the future 
expected claim frequency for this particular insured. 


(b) Determine the Bayesian credibility estimate for the future expected 
claim frequency for this particular insured. 


16.36 (*) The distribution of payments to an insured is constant over time. 
If the Biihlmann credibility assigned for one-half year of observation is 0.5, 
determine the Biihimann credibility to be assigned for three years. 


16.37 (*) Three urns contain balls marked either 0 or 1. In urn A, 10% are 
marked 0; in urn B, 60% are marked 0; and in urn C, 80% are marked 0. 
An urn is selected at random and three balls selected with replacement. The 
total of the values is 1. Three more balls are selected with replacement from 
the same urn. 


(a) Determine the expected total of the three balls using Bayes’ theo- 
rem. 


(b) Determine the expected total of the three balls using Biithlmann 
credibility. 


16.38 (*) The number of claims follows the Poisson distribution with para- 
meter À. A particular insured had three claims in the past three years. 


584 CREDIBILITY 


(a) The value of À has pdf f(A) = 47°, > 1. Determine the value 
of K used in Bithlmann’s credibility formula. Then use Bühlmann 
credibility to estimate the claim frequency for this insured. 

(b) The value of À has pdf f(A)=1, 0< À <1. Determine the value 
of K used in Btihlmann’s credibility formula. Then use Bühlmann 
credibility to estimate the claim frequency for this insured. 


16.39 (*) The number of claims follows the Poisson distribution with para- 
meter h. The value of h has the gamma distribution with pdf f(h) = he; 
h >0. Determine the Biihlmann credibility to be assigned to a single obser- 
vation. (The Bayes solution was obtained in Exercise 12.86.) 


16.40 Consider the situation of Exercise 12.88. 


(a) Determine the expected number of claims in the second year using 
Bayesian credibility. 

(b) (*) Determine the expected number of claims in the second year 
using Biihlmann credibility. 


16.41 (*) One spinner is selected at random from a group of three spinners. 
Each spinner is divided into six equally likely sectors. The number of sectors 
marked 0, 12, and 48, respectively, on each spinner is as follows: spinner A: 
2,2,2; spinner B: 3,2,1; spinner C: 4,1,1. A spinner is selected at random and 
a zero is obtained on the first spin. 


(a) Determine the Biihimann credibility estimate of the expected value 
of the second spin using the same spinner. 

(b) Determine the Bayesian credibility estimate of the expected value 
of the second spin using the same spinner. 


16.42 The number of claims in a year has the Poisson distribution with mean 
à. The parameter À has the uniform distribution over the interval (1,3). 


(a) (*) Determine the probability that a randomly selected individual 
will have no claims. 

(b) (*) If an insured had one claim during the first year, estimate the 
expected number of claims for the second year using Biihlmann 
credibility. 

(c) If an insured had one claim during the first year, estimate the ex- 
pected number of claims for the second year using Bayesian credi- 
bility. 


16.43 (*) Each of two classes, A and B, has the same number of risks. In 


class A the number of claims per risk per year has mean 4 and variance Š 


while the amount of a single claim has mean 4 and variance 20. In class B 
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the number of claims per risk per year has mean 3 and variance Š while the 
amount of a single claim has mean 2 and variance 5. A risk is selected at 
random from one of the two classes and is observed for four years. 


(a) Determine the value of Z for Bühlmann credibility for the observed 
pure premium. 


(b) Suppose the pure premium calculated from the four observations 
is 0.25. Determine the Bühlmann credibility estimate for the risk’s 
pure premium. 


16.44 (*) Let X, be the outcome of a single trial and let E(X2|X1) be the 


expected value of the outcome of a second trial. You are given the following 
information: 


Biihlmann estimate of Bayesian estimate of 


Outcome, T Pr(X; =T) E(X2|X1 =T) E(X2|X1 =T) 
1 1/3 2.72 2.6 
8 1/3 7.71 7.8 
12 1/3 10.57 - 


Determine the Bayesian estimate for E(X2|X, = 12). 
16.45 Consider the situation of Exercise 12.90. 


(a) Determine the expected number of claims.in the second year using 
Bayesian credibility. 


(b) (*) Determine the expected number of claims in the second year 
using Biihlmann credibility. 


16.46 Consider the situation of Exercise 12.91. 


(a) Determine the expected number of claims in the second year using 
Bayesian credibility. 


(b) Determine the expected number of claims in the second year using 
Bihlmann credibility. 


16.47 Two spinners, A; and Ag, are used to determine the number of claims. 
For spinner A, there is a 0.15 probability of one claim and 0.85 of no claim. 
For spinner A» there is a 0.05 probability of one claim and 0.95 of no claim. 
If there is a claim, one of two spinners, Bı and Bo, is used to determine the 
amount. Spinner Bı produces a claim of 20 with probability 0.8 and 40 with 
probability 0.2. Spinner Bə produces a claim of 20 with probability 0.3 and 40 
with probability 0.7. A spinner is selected at random from each of A1, A2 and 
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from B1, B2. Three observations from the selected pair yields claims amounts 
of 0, 20, and 0. $ 


(a) (*) Use Bühlmann credibility to separately estimate the expected 
number of claims and the expected severity. Use these estimates to 
estimate the expected value of the next observation from the same 
pair of spinners. 


(b) Use Bihlmann credibility once on the three observations to esti- 
mate the expected value of the next observation from the same pair 
of spinners. 


(c) (*) Repeat parts (a) and (b) using Bayesian estimation. 
(d) (*) For the same selected pair of spinners, determine 
lim E(X,|X1 = Xo = -e = An-1 = 0). 
n—Co 
16.48 (*) A portfolio of risks is such that all risks are normally distributed. 
Those of type A have a mean of 0.1 and a standard deviation of 0.03. Those 
of type B have a mean of 0.5 and a standard deviation of 0.05. Those of 
type C have a mean of 0.9 and a standard deviation of 0.01. There are an 


equal number of each type of risk. The observed value for a single risk is 0.12. 
- Determine the Bayesian estimate of the same risk’s expected value. 


16.49 (*) You are given the following: 


1. The conditional distribution fx\@(x|@) is a member of the linear expo- 
nential family. 


2. The prior distribution 7(@) is a conjugate prior for fx;(zx|). 
3. E(X) =1. 
4, E(X|X, = 4) = 2, where Xj is the value of a single observation. 
5. The expected value of the process variance E[Var(X|O)] = 3 

Determine the variance of the hypothetical means Var[E(X|9)]. 
16.50 (*) You are given the following: 

1. X is a random variable with mean p and variance v. 

2. uis a random variable with mean 2 and variance 4. 

3. v is a random variable with mean 8 and variance 32. 


Determine the value of the Biihlmann credibility factor Z after three ob- 
servations of X. 
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16.51 The amount of an individual claim has the exponential distribution 
with pdf fyja(ylA) = A~te-¥/, y,\ > 0. The parameter À has the inverse 
gamma distribution with pdf (A) = 400\73e-29/9. 


(a) (*) Determine the unconditional expected value, E(X). 


(b) Suppose two claims were observed with values 15 and 25. Deter- 
mine the Biihlmann credibility estimate of the expected value of 
the next claim from the same insured. 


(c) Repeat part (b), but determine the Bayesian credibility estimate. 


16.52 The distribution of the number of claims is binomial with n = 1 and 8 
unknown. The parameter 6 is distributed with mean 0.25 and variance 0.07. 
Determine the value of Z for a single observation using Biihlmann’s credibility 
formula. 


16.53 (*) Consider four marksmen. Each is firing at a target that is 100 feet 
away. The four targets are 2 feet apart (that is, they lie on a straight line at 
positions 0, 2, 4, and 6 in feet). The marksmen miss to the left or right, never 
high or low. Each marksman’s shot follows a normal distribution with mean 
at his target and a standard deviation that is a constant times the distance to 
the target. At 100 feet the standard deviation is 3 feet. By observing where 
an unknown marksman’s shot hits the straight line, you are to estimate the 
location of the next shot by the same marksman. 


(a) Determine the Biihlmann credibility assigned to a single shot of a 
randomly selected marksman. 


(b) Which of the following will increase Biihlmann credibility the most? 


i. Revise the targets to 0, 4, 8, and 12. 

ii. Move the marksmen to 60 feet from the targets. 

iii. Revise targets to 2, 2, 10, 10. 

iv. Increase the number of observations from the same marksman 
to three. 

v. Move two of the marksmen to 50 feet from the targets and 
increase the number of observations from the same marksman 
to two. 


16.54 (*) Risk 1 produces claims of amounts 100, 1,000, and 20,000 with 
probabilities 0.5, 0.3, and 0.2, respectively. For risk 2 the probabilities are 
0.7, 0.2, and 0.1. Risk 1 is twice as likely as risk 2 of being observed. A claim 
of 100 is observed, but the observed risk is unknown. 


(a) Determine the Bayesian credibility estimate of the expected value 
of the second claim amount from the same risk. 


(b) Determine the Bühlmann credibility estimate of the expected value 
of the second claim amount from the same risk. 
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16.55 (*) You are given the following: 


1. The number of claims for a single insured follows a Poisson distribution 
with mean M. 


2. The amount of a single claim has an exponential distribution with pdf 
fxja(a|A) = Ate, £, A> 0. 


. M and A are independent. 
. E(M) = 0.10 and Var(M) = 0.0025. 
. E(A) = 1,000 and Var(A) = 640,000. 


a om A & 


. The number of claims and the claim amounts are independent. 


(a) Determine the expected value of the pure premium’s process vari- 
ance for a single risk. 


(b) Determine the variance of the hypothetical means for the pure pre- 
mium. 


16.56 In Example 16.24, if p = 0, then Z = 0, and the estimator is u. That 
is, the data should be ignored. However, as p increases toward 1, Z increases 
to 1, and the sample mean becomes the preferred predictor of Xn+1. Explain 
why this is a reasonable result. 


16.57 In this exercise you are asked to derive a number of the items from 
Example 16.33. 


(a) The digamma function is formally defined as Y (a) = I’(a)/T (a). 
From this definition, show that 


W(a) = a [ casjestes dz. 


(b) The trigamma function is formally defined as U’(a). Derive an 
expression for 


9 
f (Inz)? rle? dr 
0 
in terms of trigamma, digamma, and gamma functions. 


16.58 Consider the following situation, which is similar to Examples 16.32 
and 16.33. Individual observations are samples of size 25 from a lognormal 
distribution with u unknown and ø = 2. The prior distribution for © (using O 
to represent the unknown value of jz) is normal with mean 5 and standard de- 
viation 1. Determine the Bayes, credibility, and log-credibility estimators and 
compare their mean-squared errors, evaluating them at the same percentiles 
as used in Examples 16.32 and 16.33. 
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16.59 In the following, let the random vector X represent all the past data 
and let X,4+1 represent the next observation. Let g(X) be any function of the 
past data. 


(a) Prove that the following is true. 


E {Xan IP} = Aa = EXX) 
+E{[E(Xn+1[X) — 9(X)]"}, 


where the expectation is taken over (Xn41, X). 


(b) Show that setting g(X) equal to the Bayesian premium (the mean 
of the predictive distribution) minimizes the expected squared er- 
ror, E{[Xn41 — g(X)}*}. 


(c) Show that, if g(X) is restricted to be a linear function of the past 
data, then the expected squared error is minimized by the credi- 
bility premium. 


16.5 EMPIRICAL BAYES PARAMETER ESTIMATION 


In the previous section a modeling methodology was proposed which suggested 
the use of either the Bayesian or credibility premium as a way to incorporate 
past data into the prospective rate. There is a practical problem associated 
with the use of these models which has not yet been addressed. 

In the examples, we were able to obtain numerical values for the quantities 
of interest because the input distributions fx,)9(z;|9) and 7(@) were assumed 
to be known. These examples, while useful for illustration of the methodology, 
can hardly be expected to accurately represent the business of an insurance 
portfolio. More practical models of necessity involve the use of parameters 
which must be chosen to ensure a close agreement between the model and 
reality. Examples of this include: the Poisson-gamma model (Example 16.15), 
where the gamma parameters a and f need to be selected or the Biihlmann 
or Bithlmann-Straub parameters u,v, and a. Assignment of numerical values 
to the Bayesian or credibility premium requires that these parameters be 
replaced by numerical values. 

In general, the unknown parameters are those associated with the structure 
density 7(@), and hence we refer to these as structural parameters. The 
terminology we use follows the Bayesian framework of the previous section. 
Strictly speaking, in the Bayesian context all structural parameters are as- 
sumed known and there is no need for estimation. An example of this is the 
Poisson—gamma where our prior information about the structural density was 
quantified by the choice of a = 36 and 8 = z5- For our purposes, this fully 
Bayesian approach is often unsatisfactory (e.g., when there is little or no prior 
information available, such as with a new line of insurance) and we may need 
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to use the data at hand to estimate the structural (prior) parameters. This 
approach is called empirical Bayes estimation. 

We refer to the situation where 7(@) and fx,jo(z;|@) are left largely un- 
specified (for example, in the Bihlmann or Biihlmann—Straub models where 
only the first two moments need be known) as the nonparametric case. This 
situation is dealt with in Section 16.5.1. If fx,)@(z,;|®) is assumed to be of 
parametric form (e.g., Poisson, normal, etc.) but not 7(@), then we refer to 
the problem as being of a semiparametric nature, and this is considered in 
Section 16.5.2. Finally, the (technically more difficult) fully parametric case 
where both fx,je(x;|@) and 7(@) are assumed to be of parametric form is 
briefly discussed in Section 16.5.3. 

This decision as to whether to select a parametric model or not depends par- 
tially on the situation at hand and partially on the judgment and knowledge of 
the person doing the analysis. For example, an analysis based on claim counts 
might involve the assumption that jx, jo(z3 |8) is of Poisson form, whereas the 
choice of a parametric model for 7(@) may not be reasonable. 

Any parametric assumptions should be reflected (as far as possible) in para- 
metric estimation. For example, in the Poisson case, because the mean and 
variance are equal, the same estimate would normally be used for both. Non- 
parametric estimators would normally be no more efficient than estimators 
appropriate for the parametric model selected, assuming that the model se- 
` lected is appropriate. This notion is relevant for the decision as to whether 
to select a parametric model. 

Finally, nonparametric models have the advantage of being appropriate for 
a wide variety of situations, a fact which may well eliminate the extra burden 
of a parametric assumption (often a stronger assumption than is reasonable). 

In this section the data are assumed to be of the following form. For 
each of r > 1 policyholders we have the observed losses per unit of exposure 
Xi =(Xin,..-,Xin;)? for i =1,...,r. The random vectors {Xj,i=1,...,r} 
are assumed to be statistically independent (experience of different policyhold- 
ers is assumed to be independent). The (unknown) risk parameter for the ith 
policyholder is 6;, i = 1,...,7, and it is assumed further that 01,...,0, are re- 
alizations of the independent and identically distributed random variables O; 
with structural density ~ (0;). For fixed i, the (conditional) random variables 
X;;|©; are assumed to be independent with pf fx,,;o(zij|@:), J =1,---,ni- 

Two particularly common cases produce this data format. The first is 
classification rate making or experience rating. In either, i indexes the classes 
or groups and j indexes the individual members. The second case is like the 
first where i continues to index the class or group, but now j is the year and 
the observation is the average loss for that year. An example of the second 
setting is Meyers [91], where i = 1,...,319 employment classifications are 
studied over j = 1,2,3 years. Regardless of the potential settings, we will 
refer to the r entities as policyholders. 

There may also be a known exposure vector M; = (Mir, Miz, +: ,Min,)* 
for policyholder i, where i = 1,...,7. If not (and if it is appropriate) one may 
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set mj = 1 in what follows for alli and j. For notational convenience let 


ni 
mMi = ) Mij, A E ETEN o 
j=l 


be the total past exposure for policyholder 7, and let 
1 
Bs R ER 


be the past average loss experience. Furthermore, the total exposure is 


Tr 


T ni 
m=) m=) y Mij 


i=1 i=1 j=1 


and the overall average losses are 


X = Z Y miXi = => D migXiz. (16.54) 


The parameters which need to be estimated depend on what is assumed 
about the distributions fy,,je(zij|9:) and 7(@). 

For the Bühlmann-Straub formulation there are additional quantities of 
interest. The hypothetical mean (assumed not to depend on j) is 


E(X: = 63) = u (6:) 


and the process variance is 


v (8:) 
Var(X;;|O: = 8;) = my é 
The structural parameters are 


p = Ela(®;)], v = Efe(9:)], 


and 
a = Var[a(0:)]. 


The approach to be followed in this section is to estimate yz, v, and a (when 
unknown) from the data. The credibility premium for next year’s losses (per 
exposure unit) for policyholder 7 is 


Zii +0—-—Z:)p, 2=1,...,7, (16.55) 
where Pe y 
Z= ——, k=-. 
mi +k a 
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If estimators of u,v, and a are denoted by ĝ, ô, and â, respectively, then one 
would replace the credibility premium (16.55) by its estimator 

Xi + (1 -— pÀ, (16.56) 


where 


ome th’ a 
Note that, even if ô and â are unbiased estimators of v and a, the same 
cannot be said of Å and J. Finally, the credibility premium to cover all 


Min;+1 exposure units for policyholder i in the next year would be (16. 56) 
multiplied by Mi ni+1- 


16.5.1 Nonparametric estimation 


In this section we consider unbiased estimation of u, v, and a. To illustrate 
the ideas, let us begin with the following simple Biihlmann-type example. 


Example 16.34 Suppose that n; =n > 1 for alli and Mij =1 for alli and 
j- That is, for policyholder i, we have the loss vector 


X; = (Xn, HY , S EN 
-` Furthermore, conditional on ©; = 6;, Xj; has mean 
H(8;) = E(Xi3|0; = 4) 


and variance 
v(4;) = Var(X;;|0; = i), 


and Xi1,...,Xin are independent (conditionally). Also, different policyhold- 
ers’ past data are independent, so that if i Æ s, then Xij and Xs are inde- 
pendent. In this case 


n T T n 
X; = n7! X Xy and X = rt SOX, = (rn)! 5 Y Xy. 
j=1 i=1 i=1 j=1 
Determine unbiased estimators of the Bühlmann quantities. 


An unbiased estimator of p is 


a=x 
because 
EA) = (rn) SUSU EX) = (rn) Y Y EEX: 
` i=l j=l i=1 j=l 
= (rn)! > > Ele@:)] = (rn)! Pe =p. 
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To estimate v, consider 


1 n 
ù= — Yy - Xi). 
j=1 


Recall that for fixed i the random variables X;,..., Xin are independent, 
conditional on ©; = 6;. Thus, ô; is an sanbtaged: estimate of Var(X;;|0; = 
6;) = v (8;). Unconditionally, 


E(6;) = E[E(ĉ:|0;)] = Efv(©,)] = v 


and 0; is unbiased for v. Hence an unbiased estimator of v is 


= Das Tli- Ro. (16.57) 


k j=1 


We now turn to estimation of the parameter a. Begin with 


E(X;|0; = 0:) = n $ E(Xi|O; = 0:) = n Y u(6;) = u(0:). 


j=l j=l 
Thus, N P 
E(X;) = E[E(X;|;)] = Ele(0;)] = y 
and 
Var(Xi) = Var[E(X;|©;)] + E[Var(Ž:|0;)] 
= Var[y(9:)] +E k | = a+ =. 
Therefore, Xı,..., X, are independent with common mean p and common 


variance a + u/n. Their sample average is X = r~1 ~7_, X;. Consequently, 
= em 2 

an unbiased estimator of a + v/n is (r — 1)! E;=1ı (Xi — X)”. Because we 

already have an unbiased estimator of v given above, an unbiased estimator 

of a is given by 


gt ell a 
ss Foal Xx) BS 
1 T wi = 2 1 rT n F 
= me ) TO 2X5 ~ %) (16.58) 


These estimators might look familiar. Consider a one-factor analysis of 
variance in which each policyholder represents a treatment. The estimator 
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for v (16.57) is the within (also called the error) mean square. The first term 
in the estimator for a (16.58) is the between (also called the treatment) mean 
square divided by n. The hypothesis that all treatments have the same mean 
is accepted when the between mean square is small relative to the within mean 
square—that is, when â is small relative to ô. But that implies Ê will be near 
zero and little credibility will be given to each X;. This is as it should be 
when the policyholders are essentially identical. 

Due to the subtraction in (16.58), it is possible that @ could be negative. 
When that happens, it is customary to set â= Z = 0. This case is equivalent 
to the F' test statistic in the analysis of variance being less than 1, a case that 
always leads to an acceptance of the hypothesis of equal means. 


Example 16.35 (Example 16.34 continued) As a numerical illustration, sup- 
pose we have r = 2 policyholders with n = 3 years experience for each. Let 
the losses be xı = (3,5,7)? and xo = (6,12,9)?. Estimate the Biihlmann 
credibility premiums for each policyholder. 


We have 
=4(3+5+7)=5, Xo =4(6+12+9)=9 
_and so X = $(5+9) = 7. Then fi = 7. We next have 


© 


K3- 5)? + (5-5)? + (7 -5)]= 
[(6 — 9)? + (12 —9)? + (9 — E 


and so ô = $(4+9) = 3. Then 


1 
1 2 
ao m 2 
v2 = 5 


pet 


a = [(5 — 7)? + (9—7)] — 46 = 8. 


Next, k = ô/â = 32 and the estimated credibility factor is Ê = 3/(3-+k) = 33 
The estimated credibility premiums are 


Ê + (1-ÊŻ)à 
ŽŽ + (1 _ AN = (33) (9) + (3) 


| 
rn 
ee ele 
io jot 
~~ 
Re 
or 
~—— 
-+ 
ar, 
m aje 
es slo 
r 
~J 


for policyholders 1 and 2 respectively. im 


We now turn to the more general Biihlmann—Straub setup described earlier 
in this section. We have E(Xi;) = E[E(X;;|©;)] = E[u(©;)] = uw. Thus, 


Mij e 


E(X;|9;) = ya TEANN: )= ya 


j=l j=l 


implying that 7 7 
E(X;) = E[E(X;|9;)] = E[x(;)] = u 
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Finally, 
B(X) = D mE) )= S min =n ! 


i=1 
and so an obvious unbiased estimator of p is 


=X. (16.59) 
Now, E(X;l9:) = u(O:) and Var(X;;|©;) = v(Oi)/miz for j = lycii 


Consider yu (x ee 
Pt me Moos Xe 
= ey O, i=1,...,7. (16.60) 


Condition on ©; and use (16.12) with 6 = 0 and a = v (O;). Then E(6;|0;) = 
v(;). But this means that, unconditionally, 


E(ô:) = E[E(6,|©;)] = E[v(©;)] = 


and so 6; is unbiased for v for i = 1,...,r. Another unbiased estimator for 
v is then the weighted average 6 = sy w;0;, where a wi; = 1. If we 
choose weights proportional to n; — 1, we weight the original Xijs by mij. 
That is, with w; = (n; — 1) / X; (ni — 1), we obtain an unbiased estimator 


of v, namely, 
ae (ni — 1) l 
We now turn to estimation of a. Recall that for fixed i the random variables 
Xil- --, Xin; are independent, conditional on ©;. Thus, 


Var(X;|0;) = 3 (Z2) vats = 3 (2) o 


j=l 


ô= (16.61) 


But this means that, unconditionally, 


Var(X;) = Var[E(X;|©;)] + E[Var(X; |O;)] 
= Var|u(O,;)|+E [e %]- a+ =. (16.62) 
To summarize, X,,...,X; are independent with common mean y and vari- 


ances Var(X;) = Ott fim: Furthermore, X = m`! X; m:X;. Now, (16.12) 
may again be used with 6 = a and a = v to yield 


Bom -2 =o(m- m5" mi? jei 


i=1 i=1 
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An unbiased estimator for a may be obtained by replacing v by an unbiased 
estimator 6 and “solving” for a. That is, an unbiased estimator of a is 


z —1 
a N 2 
a=|/m-m mM; 


i=1 


3 mi(X; — X)? — o(r - 1) (16.63) 


i=1 


with ô given by (16.61). An alternative form of (16.63) is given in Exercise 
16.67. 

Some remarks are in order at this point. Equations (16.59), (16.61), and 
(16.63) provide unbiased estimators for u,v, and a, respectively. They are 
nonparametric, requiring no distributional assumptions. They are certainly 
not the only (unbiased) estimators which could be used, and it is possible 
that â < 0. In this case, a is likely to be close to 0, and it makes sense 
to set Ê = 0. Furthermore, the ordinary Biihlmann estimators of Example 
16.34 are recovered with mj; = 1 and n; = n. Furthermore, as may be 
seen from Example 16.41, these estimators are essentially maximum likelihood 
estimators in the case where X;;|O; and O; are both normally distributed, and 
thus the estimators have good statistical properties. 

There is one problem using the formulas developed above. In the past, the 
data from the ith policyholder was collected on an exposure of m;. Total losses 
on all policyholders was TL = )7;_, m,X;. If we had charged the credibility 
- premium as given above, the total premium would have been 


TP = X mil: +a- i) i] 


i 
ad) 
3 
S 
| 
Pal 
a 
M 
3 
2s 


i=1 i i=1 
It is often desirable for TL to equal TP. The reason is that any premium 
increases that will meet the approval of regulators will be based on the total 
claim level from past experience. While credibility adjustments make both 
practical and theoretical sense, it is usually a good idea to keep the total 
unchanged. For this to happen, we need 


i A 
(aa ce = (jt — X3) 
or z 
BIZ => 2: 
ù= Die Xi (16.64) 
i=1 Zi 
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Table 16.7 Data for Example 16.36 


ET E E A E N E ANE ee 
Policyholder Year 1 Year 2 Year 3 Year 4 


Total claims 1 5 10,000 13,000 = 
No. in group = 50 60 79 
a a es 
Total claims 2 18,000 21,000 17,000 = 
No. in group 100 110 105 90 


That is, rather than using (16.59) to compute ji, use a credibility-weighted 
average of the individual sample means. Either method provides an unbiased 
estimator (given the Z;s), but this latter one has the advantage of preserving 
total claims. It should be noted that when using (16.63), the value of X from 
(16.54) should still be used. It can also be derived by least squares arguments. 
Finally, from Example 16.7 and noting the form of Var(X;) in (16.62), the 
weights in (16.64) provide the smallest unconditional variance for ĝ. 


Example 16.36 Past data on two group policyholders are available and are 
given in Table 16.7. Determine the estimated credibility premium to be charged 
to each group in year 4. 


We first need to determine the average claims per person for each group in 
each past year. We have nı = 2 years experience for group 1 and na = 3 for 
group 2. It is immaterial which past years’ data we have for policyholder 1, 
so for notational purposes we will choose 


mii = 50 and X11 = aoo = 200. 
Similarly, 
Then 
my = My +M = 50 + 60 = 110, 
= 10,000 + 13,000 
te e 
1 T10 209.09. 
For policyholder 2, 
18,000 
= 100, Xz = — = 
M21 3 21 100 180, 
21,000 
= 110, Xe = — = 
M22 j 22 110 190 91, 
1 
pay ai Cree agi 
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Then 
m = Mma + mM + M3 = 100 + 110 + 105 = 315, 
i a 18,000 + 21,007 + 17,000 = 177.78. 


Now, m = mı + m2 = 110 + 315 = 425. The overall mean is 


_ 10,000 + 13,000 + 18,000 + 21,000 + 17,000 
pes 425 


The alternative estimate of u (16.64) cannot be computed until later. 
Now, 


= 185.88. 


50(200 — 209.09)? + 60(216.67 — 209.09)? + 100(180 — 177.78)? 
+110(190.91 — 177.78)? + 105(161.90 — 177.78)? 


Oe (2—1)+ (3-1) 
= 17,837.87 
and so 
f 110(209.09 — 185.88)? + 315(177.78 — 185.88)? — (17,837.87)(1) 
= 425 — (110? + 3152) /425 


380.76. 


Then Å = 6 /à = 46.85. The estimated credibility factors for the two policy- 
holders are 

s 110 315 
= 710+ 46.85 — 315 + 46.85 


Per individual the estimated credibility premium for policyholder 1 is 


0.70, Ê= 0.87. 


2% + (1 — 1 )A = (0.70)(209.09) + (0.30)(185.88) = 202.13 
and so the total estimated credibility premium for the whole group is 
75(202.13) = 15, 159.75. 
For policyholder 2, 
ZoXo + (1 — Zo) fi = (0.87)(177.78) + (0.13)(185.88) = 178.83 
and the total estimated credibility premium is 
90(178.83) = 16, 094.70. 


For the alternative estimator we would use 


0.70(209.09) + 0.87(177.78) 


A= = 191.74. 
0.70 + 0.87 
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The credibility premiums are 
0.70(209.09) + 0.30(191.74) = 203.89, 0.87(177.78) + 0.13(191.74) = 179.59. 


The total past credibility premium is 110(203.89) + 315(179.59) = 78,998.75. 
Except for rounding error, this matches the actual total losses of 79,000. O 


The above analysis assumes that the parameters ju, v, and a are all unknown 
and need to be estimated, and this may not always be the case. Also, it is 
assumed that n; > 1 and r > 1. If n; = 1, so that there is only one exposure 
unit’s experience for policyholder i, it is difficult to obtain information on 
the process variance v(Q;) and thus v. Similarly, if r = 1, there is only one 
policyholder and it is difficult to obtain information on the variance of the 
hypothetical means a. In these situations, stronger assumptions are needed 
such as knowledge of one or more of the parameters (e.g., the pure premium 
or manual rate u, discussed below) or parametric assumptions which imply 
functional relationships between the parameters (discussed in Sections 16.5.2 
and 16.5.3). 

To illustrate these ideas, suppose, for example, that the manual rate u 
may be already known, but estimates of a and v may be needed. In that case, 
(16.61) can still be used to estimate v as it is unbiased whether u is known 


or not. (Why is poet mij (Xaj — n)?] /ni not unbiased for v in this case?) 


Similarly, (16.63) is still an unbiased estimator for a. However, if u is known, 
an alternative unbiased estimator for a is 


where © is given by (16.61). To see this, note that 


r 


B@) = JEK: - ul- EO) 


i=1 

P m; 
C — V: (Xi) ee 

P m 

i=1 

T 

Mi 0] 

See ee 

toa mi m 


If there are data on only one policyholder, an approach like this is necessary. 
Clearly, (16.60) provides an estimator for v based on data from policyholder 
îi alone, and an unbiased estimator for a based on data from policyholder i 
alone is 

EDON dos Dajani Mig (Xaj — Xi)? 
; = (X; — uf ~ — = (X; 2 i 
ai ( l g) Mi ( 2 Ht) Mi (nj mz: 1) E] 
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which is unbiased because E[(X; — 4)?] = Var(Xi) = a+v/m; and E(ô;) = v. 


Example 16.37 For a group policyholder, we have the following data avail- 
able: 


Year 1 Year 2 Year 3 
Total claims 60,000 70,000 - 
No. in group 125 150 200 


` If the manual rate per person is 500 per year, estimate the total credibility 
premium for year 3. 


In the above notation, we have (assuming for notational purposes that this 
group is policyholder i) m; = 125, Xi; = 60,000/125 = 480, mig = 150, 
Xi = 70,000/150 = 466.67, mi = mi + Mig = 275, and X; = (60,000 + 
70,000)/275 = 472.73. Then 


, 125(480 — 472.73)? + 150(466.67 — 472.73)? 
a Ja] 


-and with u = 500, a; = (472.73 — 500)? — (12,115.15/275) = 699.60. We then 
estimate k by ĝ;/ã; = 17.32. The estimated credibility factor is m;/(m; + 
6;/@;) = 275/(275 + 17.32) = 0.94. The estimated credibility premium per 
person is then 0.94(472.73) + 0.06(500) = 474.37 and the estimated total 
credibility premium for year 3 is 200(474.37) = 94,874. o 


= 12,115.15, 


It is instructive to note that estimation of the parameters a and v based on 
data from a single policyholder (as in the example above) is not advised unless 
there is no alternative because the estimators 0; and a; have high variability. 
In particular, we are effectively estimating a from one observation (X;). It is 
strongly suggested that an attempt be made to obtain more data. 


16.5.2 Semiparametric estimation 


In some situations it may be reasonable to assume a parametric form for the 
conditional distribution fx,,)o(zij|@:). The situation at hand may suggest 
that such an assumption is reasonable or prior information may imply its 
appropriateness. 

For example, in dealing with numbers of claims, it may be reasonable 
to assume that the number of claims mj;;X;; for policyholder 7 in year j 
is Poisson distributed with mean m;;9; given O; = 6;. Thus E(m;;X:;|0:) = 
Var(mi;Xi;|O;) = mizOi, implying that (O;) = v(Q;) = O; and so p = v 
in this case. Rather than use (16.61) to estimate v, we could use fj = X to 
estimate v. 
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Example 16.38 In the past year, the distribution of automobile insurance 
policyholders by number of claims is given below. 


No. of claims No. of insureds 


0 1,563 
il 271 
2 32 
3 7 
4 2 
Total 1,875 


For each policyholder, obtain a credibility estimate for the number of claims 
next year based on the past year’s experience, assuming a (conditional) Pois- 
son distribution of number of claims for each policyholder. 


Assume that we have r = 1,875 policyholders, n; = 1 year experience on 
each, and exposures m;; = 1. For policyholder i (where i = 1,...,1,875) 
assume that X; |O; = 0; is Poisson distributed with mean 0; so that 14(0;) = 
v(6;) = 6; and p =v. As in Example 16.34, 


1 1,875 
X aa ; 
1,875 (> xa) 


0(1,563) + 1(271) + 2(32) + 3(7) + 4(2) 


1,875 = 0.194. 


Now, 


lI 


Var(Xi1) Var[E(Xi1|0;)] + E[Var(X;1|©;)] 


Var[u(©;)] + E[v(O;)] =a +v =a+ p. 


Thus an unbiased estimator of a + v is the sample variance 


iz Ka 1,563(0 — 0.194)? + 271 (1 — 0.194)? 

Dizi (Xa ~X) +32(2 — 0.194)? + 7(3 — 0.194)? + 2(4 — 0.194)? 
1,874 1,874 

0.226. 


Il 


Thus â = 0.226—0.194 = 0.032 and ẸÅ = 0.194/0.032 = 6.06 and the credibility 
factor Z is 1/(1 + 6.06) = 0.14. The estimated credibility premium for the 
number of claims for each policyholder is (0.14).X;1 + (0.86)(0.194), where X41 
is 0, 1, 2, 3, or 4, depending on the policyholder. O 


Note that in this case v = p identically, so that only one year’s experience 
per policyholder is needed. 
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Example 16.39 Suppose we are interested in the probability that an individ- 
ual in a group makes a claim (e.g., group life insurance), and the probability 
is believed to vary by policyholder. Then mijXij could represent the number 
of the mi; individuals in year j for policyholder i who made a claim. Develop 
a credibility model for this situation. 


If the claim probability is 6; for policyholder i, then a reasonable model to 
describe this effect is that m;;Xi; is binomially distributed with parameters 
mi; and 6;, given O; = @;. Then 
f E(mij Xij|9i) = mij; and Var(mi;Xi;|9:) = mizOi(1 = 0;) 
and so u(0;) = O; with v(0;) = O;(1 — ©;). Thus 

p = EO), v= p- E[(9:)’], 
a = Va(0;)=Ef(0:)]- p? =p- v- p. o 


In these examples there is a functional relationship between the parameters 
L, v, and a which follows from the parametric assumptions made, and this 
often facilitates estimation of parameters. 


- 16.5.3 Parametric estimation 


If fully parametric assumptions are made with respect to fx,,jo(xiz|9i) and 
a(0;) for i = 1,...,r and j = 1,...,n;, then the full battery of parametric 
estimation techniques are available in addition to the nonparametric methods 
discussed earlier. In particular, maximum likelihood estimation is straight- 
forward (at least in principle) and is now discussed. For policyholder 2, the 
joint density of X; = (Xi,.--,Xin,;)* is, by conditioning on ©;, given for 
i=1,...,r by 


fx; (i) =} TI Fxcste (ois) 7(0;) dôi. (16.65) 


j=l 


The likelihood function is given by 


L=] [fx (x). (16.66) 


i=l 
Maximum likelihood estimators of the parameters are then chosen to maximize 
L or equivalently In L. 
Example 16.40 As a simple erample, suppose that n; = n fori =1,...,7 
and mij = 1. Let X;;|Q; be Poisson distributed with mean ©;, that is, 


07% eti 
2 


Fxi3\0 (tig |8i) = =, tz =0,1,..., 
Tiz! 
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and let ©; be exponentially distributed with mean p, 
x(6;) = l -oi/u 
7( ee . 8; > 0. 


Determine the maximum likelihood estimator of p. 


Equation (16.65) becomes 


ps) = f° (TTS) Seow 
x(x) = + | et dh; 
o (j 7u! u 


: 1 

1 [? 2, ai; 
[[ 2s! if gu i= Fe Oi(n+1/) dA; 
ey H Jo 


ll 


Il 


pa 


ere: | 
oe (a+; n ma 


where C(x;) may be expressed in combinatorial notation as 


and 
n 
a= Nti +1. 
j=l 


The integral is that of a gamma density with parameters a and 1/6 and 
therefore equals 1, and so 


1\7 ja il 
F (xi) = C(x;)u7* (n + x) . 


Substitution into (16.66) yields 


1\~ Viet Lj=1 Tig 
L(y) x po (> + x] : 


Thus 


iy) =n L(y) = -rinu — HEE) In (n+2) +c, 


i=1 j=1 
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where c is a constant which does not depend on yp. Differentiating yields 


I'( yan Tt Nien jen tia (-5) 
a n+ž B) 


The maximum likelihood estimator ĝ of p is found by setting l’ (A) = 0, which 


yields 
T n 
T M r+ isi yal Tij 
ji (iin +1) 
and so 
pn1a14 2S ay 
Le arg st 1 
or 
== yay. 
i=1 j=1 


But this is the same as the nonparametric estimate obtained in Example 16.34. 
An explanation is in order. We have p(6;) = 0; by the Poisson assumption 
and so E[u(©;)| =E(©;), which is the same yz as was used in the exponential 
_distribution 7(6;). 

Furthermore, v(6;) = 6; as well (by the Poisson assumption), and so v = 
E[v(©;)| = u. Also, a = Var[u(©;)] = Var(@:i) = p? by the exponential 
assumption for a(6;). Thus the maximum likelihood estimators of v and a 
are ù and ji” by the invariance of maximum likelihood estimation under a 
sacha transformation. Similarly, the maximum likelihood estimators of 
= v/a, ihe credibility factor Z, and the credibility premium ZXi+(1-Z)y 
are k = fp = X-1, Z = n/(n+ ji*), and ZX, + (1 — Z)ji, respectively. 
We mention also that credibility is exact in this model so that the Bayesian 
premium is equal to the credibility premium. 0 


Example 16.41 Suppose that nı = n for alli and mij = 1. Assume that 
Xi;lO: ~ N(0;, v), 


= 1 
fx.;\0 (zizli) = (27v) 1/2 exp |--Z = ay > ~H< Tij < œ, 
and ©; ~ N(p,a), so that 
1 
n(0;) = (2ra) 1? exp ae — n| , —o < f; < 00. 


Determine the maximum likelihood estimators of the parameters. 


We have u(0;:) = 0; and v(@;) = v. Thus u = E[u(©;)], v = E[v(©;)], and 
a = Var[p(O;)], consistent with previous use of p, v, and a. We shall now 
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derive maximum likelihood estimators of u,v, and a. To begin with, consider 


X= n! ae Xij. Conditional on ©;, the Xj; are independent N(O;, v) 
random variables, implying that X;|O; ~ N(:,v/n). Because ©; ~ N(, a), 
it follows from Example 4.30 that unconditionally Xi ~ N(p,a+v/n). Hence 
the density of X; is, with w = a + v/n, 


> 1 
F(2:) = (2rw) ™? exp -5e — w , ~00 < Bj < œ. 
On the other hand, by conditioning on ©;, we have 
3 = [| omm eee |e a2 
E) = | (rom opz E: — 6) 
és 1 
x (20a) */? exp |x: — n? dé 
2a 
Ignoring terms not involving yp, v, or a, this means that f(#;) is proportional 


to 
co 
—1/2,—1/2 Ree 9.2 Le. 2 
v a J ap] om (T: — 93) 5, (9 p) |æ 


Now (16.65) yields 


fœ) = T ġe- (27v) 1? exp De J} on 


Le 
x exp |- eÈ - w? dé, 


which is proportional to 
v7” 2a71/2 > exp -4 Ss — 6;)? — 4; _ J dé 
= 2v al 2a 
Now use the identity (16.10) restated as 
Slay — 6:)? = ey -= &)? + n(Zi — 6), 
j=l j=l 


which means that f(x;) is proportional to 


awaa ke 1 : 2 1 
yg a sof- Sete tne J 5, n as: 


j=1 
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itself proportional to 
1 n 
y~ (”—1)/2 exp E N (zi — zy f (Zi) 
2u o> 


using the second expression for the density f(Z;:) of X; given above. Then 
(16.66) yields 


Lx gre 1)/2 exp | -57 D SG! E z) Jre) 


Ui j=1 i=l 


Let us now invoke the invariance of maximum likelihood estimators under a 
parameter transformation and use u, v, and w = a + v/n rather than p, v 
and a. This means that 

L œ Li(v)La(u, w), 


where 
Iy(v) =v r01) exp -sš 2 > Tij — Zi)? 
-and 4 
o(ji;u) = Jre) = - I [oro Peol- -nh 


The maximum likelihood estimator 6 of v can be found by maximizing Lı (v) 
alone and the mle (Å, ù) of (u,w) can be found by maximizing Lo(p,w). 
Taking logarithms, we obtain 


h(v) = Bu v-Ł5 Pey- zi)”, 


i=1 j=1 


nw) = -E2 LED e-a, 


i=1 j=1 


and with l’ (6) = 0 we have 


j= Yi- L= (Xij — X;)? 
E r(n— 1) ` 


Because Lo(p, w) is the usual normal likelihood, the mles are simply the em- 
pirical mean and variance. That is, 


TPID 


i=l j=1 
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and 
O: A E 
a XY. 


But a = w — v/n and so the maximum likelihood estimator of a is 


a= D- -aLe ERO 


i=1 j=1 


It is instructive to note that the maximum likelihood estimators fi and ô are 
exactly the nonparametric unbiased estimators in the Bühlmann model of 
Example 16.34. The maximum likelihood estimator â is almost the same as 
the nonparametric unbiased estimator, the only difference being the divisor r 
rather than r — 1 in the first term. O 


16.5.4 Notes and References 


In this section a simple approach was employed to find parameter estimates. 
No attempt was made to find optimum estimators in the sense of minimum 
variance. A good deal of research has been done on this problem. See 
Goovaerts and Hoogstad [46] for more details and further references. 


16.5.5 Exercises 


16.60 Past claims data on a portfolio of policyholders are given in Table 16.8. 
Estimate the Biihlmann credibility premium for each of the three policy- 
holders for year 4. 


16.61 Past data on a portfolio of group policyholders are given in Table 16.9. 
Estimate the Bithlmann-—Straub credibility premiums to be charged to each 
group in year 4. 


16.62 For the situation in Exercise 16.9, estimate the Biihlmann credibility 
premium for the next year for the policyholder. 


Table 16.8 Data for Exercise 16.60 


Year 
Policyholder 1 2 3 
1 750 800 650 
2 625 600 675 


3 900 950 850 
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Table 16.9 Data for Exercise 16.61 


Year F 

Policyholder 1 2 3 4 
Claims 1 - 20,000 25,000 — 
No. in group - 100 120 110 
Claims 2 19,000 18,000 17,000 - 
No. in group 90 75 70 60 
Claims 3 26,000 30,000 35,000 =- 
No. in group 150 175 180 200 


16.63 Consider the Bühlmann model in Example 16.34. 


(a) Prove that Var(Xi;) =a +v. 
(b) If {Xi;: i=1,...,r and j =1,...,n} are unconditionally inde- 
pendent for all i and j, argue that an unbiased estimator of a + v 


is 
> 1 DD hi ~ )?. 


i=l j=l 


(c) Prove the algebraic identity 
YK — 2)? = OO Xu- +n) (Z-Z. 
i=1 j=1 t=1 j=1 i=l 


(d) Show that, conditionally, 


1 tere Sep n—1 
Xij — “| = Ear 
E mi2 j— X) (v+a) EET 


(e) Comment on the implications of (b) and (d). 


16.64 The distribution of automobile insurance policyholders by number of 
claims is given in Table 16.10. 

Assuming a (conditional) Poisson distribution for the number of claims per 
policyholder, estimate the Bühlmann credibility premiums for the number of 
claims next year. 


16.65 Suppose that, given O, X1,...,X, are independently geometrically 
distributed with pf 


1 ON 
Fx;jo(%1®) = 375 r zj =0,1,... : 
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Table 16.10 Data for Exercise 16.64 


No. of claims No. of insureds 


0 2,500 
1 250 
2 30 
3 5 
4 2 


Total 2,787 


(a) Show that (0) = 8 and v(0) = 0(1 +0). 

(b) Prove that a = v — u — p°. 

(c) Rework Exercise 16.64 assuming a (conditional) geometric distrib- 
ution. 


16.66 Suppose that 


mM0:\tii e miih 
Pr(mijXij = tjl: = 91) = aie 
ij: 


and j : 
m (8) = o 6; >00. 


Write down the equation satisfied by the maximum likelihood estimator ji of 
u for Biihlmann-Straub-type data. 


16.67 (a) Prove the algebraic identity 
do do mg (Xig — XY? = SOV gg (Xi — Ki)? + ma Xi — ZY? 
i=l j=1 i=1 j=1 i=1 


(b) Use part (a) and (16.61) to show that (16.63) may be expressed as 


2 ag | Doin ja May(Xy—-X)? 
â = my |S - 8 
en Ni — 1 
where my 
t 
Sham (1-2) 
Jci ni— 1 


16.68 (*) A group of 340 insureds in a high-crime area submit the 210 theft 
claims in a one-year period as given in Table 16.11. 


My = 
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Table 16.11 Data for Exercise 16.68 


Number of claims Number of insureds 
0 200 
1 80 
2 50 
3 10 


Each insured is assumed to have a Poisson distribution for the number of 
thefts, but the mean of such a distribution may vary from one insured to 
another. If a particular insured experienced two claims in the observation 
period, determine the Biihlmann credibility estimate for the number of claims 
for this insured in the next period. 


Simulation 


17.1 BASICS OF SIMULATION 


Simulation has had an on-again, off-again history in actuarial practice. For 
example, in the 1970s, aggregate loss calculations were commonly done by 
simulation because the analytical methods available at the time were not ade- 
quate. However, the typical simulation often took a full day on the company’s 
mainframe computer, a serious drag on resources. In the 1980s analytic meth- 
ods such as Heckman—Meyers and the recursive formula were developed and 
were found to be significantly faster and more accurate. Today, desktop com- 
puters have sufficient power to run complex simulations that allow for the 
analysis of models not suitable for current analytic approaches. 

In a similar vein, as investment vehicles become more complex, contracts 
have interest-sensitive components, and market fluctuations seem to be more 
pronounced, analysis of future cash flows must be done on a stochastic basis. 
In order to accommodate the complexities of the products and interest rate 
models, simulation has become the technique of choice. 

In this chapter we will provide some illustrations of how simulation can 
solve problems such as those mentioned above. It is not our intention to 
cover the subject in great detail, but rather to give the reader an idea of how 
simulation can help. Study of simulation texts such as Herzog and Lord [53] 


7 


ans es 
Loss Models: From Data to Decisions, Second Edition. 

By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 
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Ripley [110], and Ross [115] will provide many important additional insights. 
In addition, simulation can also be an aid in evaluating some of the statistical 
techniques covered in earlier chapters. This will also be covered here with an 
emphasis on the bootstrap method. 


17.1.1 The simulation approach 


The beauty of simulation is that once the model is created little additional 
creative thought is required. The entire process can be summarized in four 
steps, where the goal is to determine values relating to the distribution of a 
random variable S. 


1. Build a model for S which depends on random variables X,Y, Z,..., 
where their distributions and any dependencies are known. 


2. For j = 1,...,n generate pseudorandom values 2;,y;,2;,... and then 
compute s; using the model from step 1. 


3. The cdf of S may be approximated by F;,(s), the empirical cdf based on 
the pseudorandom sample s1,...,8n. 


- 4. Compute quantities of interest, such as the mean, variance, percentiles, 
or probabilities, using the empirical cdf. 


Two questions remain. First, what does it mean to generate a pseudoran- 
dom variable? Consider a random variable X with cdf Fx(z). This is the 
real random variable produced by some phenomenon of interest. For example, 
it may be the result of the experiment “collect one automobile bodily injury 
medical payment at random and record its value.” We assume that the cdf is 


3 
known. For example, it may be the Pareto cdf, Fx (£) = 1— (8) . Now 
consider a second random variable, X*, resulting from some other process, but 
with the same Pareto distribution. A random sample from X*, say Tj, esa Ena 
would be impossible to distinguish from one taken from X. That is, given the 
n numbers, we could not tell if they arose from automobile claims or something 
else. This means that, instead of learning about X by observing automobile 
claims, we could learn about it by observing X D Obtaining a random a 
ple from a Pareto distribution is still probably difficult, so we have not ye 
accomplished much. , 

We can make some progress by making a concession. Let us accept ina 
replacement for a random sample from X* a sequence of numbers z}*,... Th 
which is not a random sample at all, but simply a sequence of numbers which 


l This is not entirely true. A great deal of creativity may be employed in designing an 
efficient simulation. The brute force approach used here will work; it just may take your 
computer longer to produce the answer. 
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may not be independent, or even random, but was generated by some known 
process that is related to the random variable X*. Such a sequence is called 
a pseudorandom sequence because anyone who did not know its origin could 
not distinguish it from a random sample from X* (and therefore from X). 
Such a sequence will be satisfactory for our purposes. 

The field of developing processes for generating pseudorandom sequences 
of numbers has been well developed. One fact that makes it easier to do this 
is that it is sufficient to be able to generate such sequences for the uniform 
distribution on the interval (0,1). That is because, if U has the uniform(0, 1) 
distribution, then X = Fy1(U) will have F'x(z) as its cdf. Therefore, we 
simply obtain uniform pseudorandom numbers Uj",.--,Uz* and then let r7 = 


Fy *(us*). This is called the inversion method of generating random variates, 
Specific methods for particular distributions have been developed but will 
not be discussed here. There is a considerable literature on the best ways to 
generate pseudorandom uniform numbers and a variety of tests proposed to 


evaluate them. Readers are cautioned to ensure that one being used is a good 
one. 


Example 17.1 Generate 10,000 pseudo-Pareto (with a = 3, and 0 = 1,000) 


variates and verify that they are indistinguishable from real Pareto observa- 
tions. 


The pseudouniform values were obtained using the built-in generator sup- 


plied with a commercial programming language. The pseudo-Pareto values 
are calculated from ; 


aa. 1,000 \3 
ae oe 1,000+2** } ` 


o™ = 1,000[(1 — u**)-¥/3 — 1]. 


So, for example, if the first value generated is uj* = 0.54246, we have cs = 
297.75. This was repeated 10,000 times. The results are displayed in Ta- 
ble 17.1, where a chi-square goodness-of-fit test is conducted. The expected 
counts are calculated using the Pareto distribution with a = 3 and 0 = 1,000. 
Because the parameters are known, there are nine degrees of freedom. At a 
significance level of 5% the critical value is 16.92, and we conclude that the 


pseudorandom sample could have been a random sample from this Pareto 
distribution. . m 


That is, 


When the distribution function of X is continuous and strictly increasing, 
the equation u = F(z) will have a unique solution for any u. In that case 
the inversion method reduces to solving the equation. In other cases some 
care must be taken. Suppose Fy (x) jumps at z = c so that Fx (c~) = a and 
F'x(c) =b > a. If the uniform number is such that a < u < b, the equation 
has no solution. In that situation choose c as the simulated value. 
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Table 17.1 Chi-square test of simulated Pareto observations 


Interval Observed Expected Chi square 
0-100 2,519 2,486.85 0.42 
100-250 2,348 2,393.15 0.85 
250-500 2,196 2,157.04 0.70 
500-750 1,071 1,097.07 0.62 
750-1,000 635 615.89 0.59 
1,000-1,500 589 610.00 0.72 
1,500-2,500 409 406.76 0.01 
2,500-5,000 192 186.94 0.14 
5,000-10,000 36 38.78 0.20 
10,000- 5 7.51 0.84 
Total 10,000 10,000 5.10 


Example 17.2 Suppose 


Ts 0.52, O<a<l, 
X=] 0.540.252 1<r<2. 


Determine the simulated values of x resulting from the uniform numbers 0.3, 
0.6, and 0.9. 


In the first interval, the distribution function ranges from 0 to 0.5 and in 
the second interval from 0.75 to 1. With u = 0.3 in the first interval, solve 
0.3 = 0.52 for z = 0.6. With the distribution function jumping from 0.5 to 
0.75 at z = 1, any u in that interval will lead to a simulated value of 1, so for 
u = 0.6, the simulated value is z = 1. Note that Pr(0.5 < U < 0.75) = 0.25, 
so the value of x = 1 will be simulated 25% of the time, matching its true 
probability. Finally, with 0.9 in the second interval, solve 0.9 = 0.5 + 0.252 
for z = 1.6. Figure 17.1 illustrates this process, showing how drawing vertical 
bars on the function makes the inversion obvious. O 


It is also possible for the distribution function to be constant over some 
interval. In that case the equation u = Fx(ax) will have multiple solutions 
over that interval. Our convention (to be justified shortly) is to choose the 
largest possible value in the interval. 


Example 17.3 Suppose 


0.52, 0<2<1, 


052-05 2<2<3. 
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z 
a 
N 


7 
xo a ee 


Fig. 17.1 Inversion of the distribution function for Example 17.2 


Determine the simulated values of x resulti 
0.5, and 0.9. 


ng from the uniform numbers 0.3, 

re first interval covers values of the distribution function from 0 to 0 5 
an the final interval covers the range 0.5 to 1. For u = 0.3 use the frst 
interval and solve 0.3 = 0.5z for z = 0.6. The function is Sonsan at 0.5 from 


1 to 2 and so for u = 0.5 choose the lar 
; gest value, z = 2. Fi a 
final interval and solve 0.9 = 0.52 — 0.5 for z = 2.8. ee fe 


Discrete distributions have both features. The distribution function jumps 
at-the possible values of the variable and is constant in between - 


Prampig 17 .4 Simulate values from a binomial distribution with m = 4 and 
q =0.5 using the uniform numbers 0.3, 0.6875, and 0.95. 


The distribution function is 


0, z <0, 
0.0625, 0<z<1, 
Fx(£)= 0.3125, 1<2<2Q2, 


0.6875, 2<2 <8, 
0.9375, 3<a2<4, 
1, T>4. 


For u = 0.3, the function is j i 
, Jumping at x = 1. For u = 0.6875, the function is 
sean from 2 to 3 (as the limiting value of the interval) and so z = 3. For 
eae the function is jumping at x = 4. It is usually easier to present the 
ation algorithm using a table based on the distribution function, Then 
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a simple table lookup function (such as the VLOOKUP function in Excel®) 
can be used to obtain simulated values. For this example, the table is as 
follows. : 


For u in this range, the simulated value is 


0 < u < 0.0625, 
0.0625 < u < 0.3125, 
0.3125 < u < 0.6875, 
0.6875 < u < 0.9375, 
0.9375 <u <1, 


mwrer © 


Many random number generators can produce a value of 0 but not a value of 
1 (though some produce neither one). This is the motivation for choosing the 
largest value in an interval where the cdf is constant. O 


The second question is: What value of n should be used? We know that 
any consistent estimator will be arbitrarily close to the true value with high 
probability as the sample size is increased. In particular, empirical estimators 
have this attribute. With a little effort we should be able to determine the 
value of n that will get us as close as we want with a specified probability. 
Often, the central limit theorem will help, as in the following example. 


Example 17.5 (Example 17.1 continued) Use simulation to estimate the 
mean, Fx (1,000), and 70.9, the 90th percentile of the Pareto distribution with 
a = 3 and 0 = 1,000. In each case, stop the simulations when you are 95% 
confident that the answer is within +1% of the true value. 


In this example we know the values. Here, p = 500, Fx (1,000) = 0.875, 
and 7.9 = 1,154.43. For instructional purposes we will behave as if we do not 


know these values. 
The empirical estimate of is Z7. The central limit theorem tells us that 


for a sample of size n 


0.95 = Pr(0.99u < n < 1.014) 
0.0lp _ Xp»—pw _ 0.01p 
= — < < 
a ( ojn = ofa ~ ojn 


ll 


0.01 0.01. 
—— are < < 
Pr ( TY aaa 


where Z has the standard normal distribution. Our goal is achieved when 


=E = 1.96, (17.1) 
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nee means n = 38,416(0/)?. Because we do not know the values of o 
and y, we estimate them with the sample standard deviation and mean. The 
estimates improve with n, so our stopping rule is to cease simulating when 


2 
ws seis 

. T 

Sga ae conducted by the authors, the criterion was met 
nn = ,934, at which point Z = 501.1 i 

ee p 5, a relative error of 0.23%, well 

We now turn to the estimation of F. iri 
l x (1,000). The empirical estimat 
is the sample proportion below 1,000, say Pa/n, where P, is the aie 
below 1,000 after n simulations. The central limit theorem tells us that P,/n 
is approximately normal with mean F'x(1,000) and variance Fx (1 000) [1 — 
Fx (1,000)]/n. Arguing as above, the requirement will be met when 


n > 38,4162 Ez, 
P 


n 


For our simulation, the criterion was met a i 
i , t n = 5,548, at which point the 
estimate was 4,848/5,548 = 0.87383, which has a relative error of 0.13% 
Finally, for 79.9, begin with l 


0.95 = Pr(Y. < 70.9 < Yp), 


where Y; < Y> <--- < Y, are the order statisti i 
he <¥o< < cs from the simulated sample, 
ee ey, = [0.97 + 1.96,/0.9(0.1)n] +1 (where [-] is the 
er function and the 1 is added f i 
an ans ai adaed for conservatism), and the process 


; 70.9 ug Ya < 0.01.9 
and 
Y, — tto.9 < 0.01.9. 


For the example, this occurred when n = 126 i 
EX: : = ,364, and the estimat 
percentile is 1,153.97, with a relative error of 0.04%. PERS g 


17.1.2 Exercises 


17.1 Use the inversion method to simulate thre 
7.1 Use | e values from the Poisson(3 
distribution. Use 0.1247, 0.9321, and 0.6873 for the uniform random foe 


17.2 Use th i i 
a e uniform random numbers 0.2, 0.5, and 0.7 to simulate values 


0.25, 0<r<2, 
fx(z)=¢{ 0.1, 4<2<9, 
0, otherwise. 
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17.3 Demonstrate that 0.95 = Pr(Y, < 70.9 < Yp) for Y, and Y, as defined 
in Example 17.5. 


17.4 You are simulating observations from an exponential distribution with 
6 = 100. How many simulations are needed to be 90% certain of being within 
2% of each of the mean and the probability of being below 200? Conduct the 
required number of simulations and note if the 2% goal has been reached. 


17.5 Simulate 1,000 observations from a gamma distribution with a = 2 and 
0 = 500. Perform the chi-square goodness-of-fit and Kolmogorov-Smirnov 
tests to see if the simulated values were actually from that distribution. 


17.6 (*) To estimate E(X), you have simulated five observations from the 
random variable X. The values are 1, 2, 3, 4, and 5. Your goal is to have the 
standard deviation of the estimate of E(X) be less than 0.05. Estimate the 
total number of simulations needed. 


17.2 EXAMPLES OF SIMULATION IN ACTUARIAL MODELING 


17.2.1 Aggregate loss calculations 


The analytic methods presented in Chapter 6 have two features in common. 
First, they are exact up to the level of the approximation introduced. For 
recursion and the FFT, that involves replacing the true severity distribution 
with an arithmetized approximation. For Heckman—Meyers a histogram ap- 
proximation is required. Furthermore, Heckman—Meyers requires a numerical 
integration. In each case, the errors can be reduced to near zero by increasing 
the number of points used. Second, both recursion and inversion assume that 
aggregate claims can be written as S = Xi +--+ Xy with N,X1,Xo,... 
independent and the Xjs identically distributed. 

There is no need to be concerned about the first feature because the ap- 
proximation error can be made as small as desired. However, the second 
restriction may prevent the model from reflecting reality. In this section we 
indicate some common ways in which the independence or identical distribu- 
tion assumptions may fail to hold and then demonstrate how simulation can 
lead to a solution. When the Xjs are i.id. it does not matter how we go about 
labeling the losses—that is, which loss is called X,, which one X2, and so on. 
With the assumption removed, the labels become important. Because S is the 
aggregate loss for one year, time is a factor. One way of identifying the losses 
is to let X; be the first loss, Xə be the second loss, and so on. Then let T} 
be the random variable that records the time of the jth loss. Without going 
into much detail about the claims-paying process, we do want to note that T} 
may be the time at which the loss occurred, the time it was reported, or the 
time payment was made. In the latter two cases it may be that T; > 1, which 
occurs when the report of the loss or the payment of the claim takes place at 
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a time subsequent to the end of the time period of the coverage, usually one 
year. If the timing of the losses is important, we will need to know the joint 
distribution of (T1, To, ... , X1, X23...) 


17.2.2 Examples of lack of independence or identical distributions 


There are two common ways to have the assumption fail to hold. One is 
through accounting for time (and in particular the time value of money) and 
the other is through coverage modifications. The latter may have a time factor 
as well. The following examples provide some illustrations. 


Example 17.6 (Time value of loss payments) Suppose the quantity of inter- 
est, S, is the present value of all payments made in respect of a policy issued 
today and covering loss events that occur in the next year. Develop a model 


for S. 


Let T; be the time of the payment of the jth loss. While T; records the 
time of the payment, the subscripts are selected in order of the loss events. 
Let T; = C; + L; where C} is the time of the event and L; is the time 
from occurrence to payment. Assume they are independent and the L;s are 
independent of each other. Let the time between events, Cj — C1 elicie 
Co = 0), be iid. with an exponential distribution with mean 0.2 years. 

Let X; be the amount paid at time T; on the loss that occurred at time 
Cj. Assume that X; and C} are independent (the amount of the claim does 
not depend on when in the year it occurred) but X; and Lj are positively 
correlated (a specific distributional model will be specified when the example 
is continued). This is reasonable because the more expensive losses may take 
longer to settle. 

Finally, let V; be a random variable that represents the value, which, if 
invested today, will accumulate to 1 in t years. It is independent of all X ; 
Cj, and Lj. But clearly, for s Æ t, V, and V; are dependent. j 

We then have 


N 
S= X X;Vr; 
j=l 
where N = maxc,<i{j}. The various dependencies were established in the 
development of the random variables. g 


Example 17.7 (Out-of-pocket maximum) Suppose there is a deductible, d, 
on individual losses. However, in the course of a year, the policyholder will 
pay no more than u. Develop a model for the insurer’s aggregate payments. 


Let X; be the amount of the jth loss. Here the assignment of 7 does not 
matter. Let W; = X}; Ad be the amount paid by the policyholder due to the 
deductible and let Y; = X; — W; be the amount paid by the insurer. Then 
R=W,+---+ Wy is the total amount paid by the policyholder prior to 
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imposing the out-of-pocket maximum. Then the amount actually paid by the 
policyholder is R, = RAu. Let S = Xı +---+ Xy be the total losses, and 
then the aggregate amount paid by the insurer is T = S — R,. Note that 
the distributions of S and R, are based on i.i.d. severity distributions. The 
analytic methods described earlier can be used to obtain their distributions. 
But because they are dependent, their individual distributions cannot be com- 
bined to produce the distribution of T. There is also no way to write T as a 
random sum of i.i.d. variables. At the beginning of the year, it appears that 
T will be the sum of iid. Y;s, but at some point the Y;s may be replaced by 
Xjs as the out-of-pocket maximum is reached. m 


17.2.3 Simulation analysis of the two examples 


We now complete the two examples using the simulation approach. The mod- 
els have been selected arbitrarily, but we should assume they were determined 
by a careful estimation process using the techniques presented earlier in this 
text. 


Example 17.8 (Example 17.6 continued) The model is completed with the 
following specifications. The amount of a payment (X;) has the Pareto dis- 
tribution with parameters a = 3 and @ = 1,000. The time from the occurrence 
`of a claim to its payment (L;) has a Weibull distribution with 7 = 1.5 and 
0 = In(X;)/6. This models the dependence by having the scale parameter de- 
pend on the size of the loss. The discount factor will be modeled by assuming 
that, for t > s, [In(V./V:)|/(£ — s) has a normal distribution with mean 0.06 
and variance 0.0004(t—s). We do not need to specify a model for the number 
of losses. Instead, we use the model given earlier for the time between losses. 
Use simulation to determine the expected present value of aggregate payments. 


The mechanics of a single simulation will be done in detail, and that should 
indicate how the process is to be done. Begin by generating i.i.d. exponential 
interloss times until their sum exceeds 1 (in order to obtain one year’s worth of 
claims). The individual variates are generated from pseudouniform numbers 
using 

u=1—e*, 
which yields 


x = —0.2In(1 — u). 
For the first simulation, the uniform pseudorandom numbers and the corre- 
sponding x values are (0.25373, 0.0585), (0.46750, 0.1260), (0.23709, 0.0541), 
(0.75780, 0.2836), and (0.96642, 0.6788). At this point the simulated zs total 
1.2010 and therefore there are four loss events, occurring at times c; = 0.0585, 
c2 = 0.1845, c3 = 0.2386, and c4 = 0.5222. 
The four loss amounts are found from inverting the Pareto cdf. That is, 


x = 1,000[(1 — u)~1/8 — 1]. 
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The four pseudouniform numbers are 0.71786, 0.47779, 0.61084, and 0.68579. 
This produces the four losses zı = 524.68, ry = 241.80, x3 = 369.70, and ` 
T4 = 470.93. 

The times from occurrence to payment have a Weibull distribution. The 
equation to solve is 


u = 1 — e7 li ate) 


where z is the loss. Solving for the lag time l yields 
l= 4In(z)[—In(1 — u) 7°. 
For the first lag we have u = 0.23376 and so 
lı = 4 In(524.68)(— In 0.76624]?/5 = 0.4320. 


Similarly, with the next three values of u being 0.85799, 0.12951, and 0.72085, 
we have l2 = 1.4286, l3 = 0.2640, and l4 = 1.2068. The payment times of 
the four losses are the sum of c; and lj, namely tı = 0.4905, t2 = 1.6131, 
tg = 0.5026, and t4 = 1.7290. 

Finally, we generate the discount factors. They must be generated in order 
of increasing t; so we first obtain vo.4995. We begin with a normal variate 
with mean 0.06 and variance 0.0004(0.4905) = 0.0001962. Using inversion, 
the simulated value is 0.0592 = [In(1/vpo.4905)|/0.4905 and so vp.4995 = 0.9714. 
Note that for the first value we have s = 0, and vo = 1. For the second 
value we require a normal variate with mean 0.06 and variance (0.5026 — 
0.4905) (0.0004) = 0.00000484. The simulated value is 


In(0.9714/v9.5026) : : 
0.0604 = = 5026 = 
0.0121 for U0.5026 0.9707. 
For the next two payments, we have 
In(0.9707/v 
0.0768 = BOTs) for v1,6131 = 0.8913, 
In(0.8913/v1.79 
0.0628 = BO Sts fen-ra90) for v1.7290 = 0.8848. 


We are now ready to determine the first simulated value of the aggregate 
present value. It is 


Il 


sı 524.68(0.9714) + 241.80(0.8913) + 369.70(0.9707) + 470.93(0.8848) 


= 1,500.74. 


The process was then repeated until there was 95% confidence that the esti- 
mated mean was within 1% of the true mean. This took 26,944 simulations, 
producing a sample mean of 2,299.16. o 


Example 17.9 (Example 17.7 continued) For this erample, set the deductible 
d at 250 and the out-of-pocket maximum at u = 1,000. Assume that the 
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Table 17.2 Negative binomial cumulative probabilities 


n Fy(n) n Fy(n) 
0 0.03704 8 0.76589 
1 0.11111 9 0.81888 
2 0.20988 10 0.86127 
3 0.31962 11 0.89467 
4 0.42936 12 0.92064 
5 0.53178 13 0.94062 
6 0.62282 14 0.95585 
va 0.70086 15 0.96735 


number of losses has the negative binomial distribution with r = 3 and B = 2. 
Further assume that individual losses have the Weibull distribution with T = 2 
and 0 = 600. Determine the 95th percentile of the insurer’s losses. 


In order to simulate the negative binomial claim counts, we require the cdf 
of the negative binomial distribution. There is no closed form, but a table 
can be constructed, and one appears here as Table 17.2. The number of losses 
for the year is generated by obtaining one pseudouniform value—for example, 
u = 0.47515—and then determining the smallest entry in the table that is 
larger than 0.47515. The simulated value appears to its left. In this case our 
first simulation produced n = 5 losses. 

The amounts of the five losses are obtained from the Weibull distribution. 
Inversion of the cdf produces 


z = 600[—In(1 — u)]?/?. 


The five simulated values are 544.04, 453.67, 217.87, 681.98, and 449.83. The 
total loss is 2,347.39. The policyholder pays 250.00+250.00+217.87+250.00-+ 
250.00 = 1,217.87, but the out-of-pocket maximum limits this to 1,000. Thus 
our first simulated value has the insurer paying 1,347.39. 

The goal was set to be 95% confident that the estimated 95th percentile 
would be within 2% of the true value. This required 11,476 simulations, 
producing an estimated 95th percentile of 6,668.18. O 


17.2.4 Statistical analyses 


Simulation can help in a variety of ways when analyzing data. Two will 
be discussed here, both-of which have to do with evaluating a statistical 
procedure. The first is the determination of the p-value (or critical value) for 
a hypothesis test. The second is to evaluate the mean-squared error of an 
estimator. We begin with the hypothesis testing situation. 
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Example 17.10 Tt is conjectured that losses have a lognormal distribution. 
One hundred observations have been collected and the Kolmogorov-Smirnov 
test statistic is 0.06272. Determine the p-value for this test, first with the null 
hypothesis being that the distribution is lognormal with B= 7 ando =1 and 
then with the parameters unspecified. 


For the null hypothesis with each parameter specified, one simulation in- 
volves first simulating 100 lognormal observations from the specified lognormal 
distribution. Then the Kolmogorov-Smirnov test statistic is calculated. The 
estimated p-value is the proportion of simulations for which the test statistic 
exceeds 0.06272. After 1000 simulations, the estimate of the p-value is 0.836. 

With the parameters unspecified, it is not clear which lognormal distribu- 
tion should be used. It turned out that for the observations actually collected 
Ê = 7.2201 and & = 0.80893. These were used as the basis for each simu- 
lation. The only change is that after the simulated observations have been 
obtained, the results are compared to a lognormal distribution with parame- 
ters estimated (by maximum likelihood) from the simulated data set. For 
1,000 simulations, the test statistic exceeded 0.06272 491 times, for an esti- 
mated p-value of 0.491. 

As indicated in Section 13.4.1, not specifying the parameters makes a con- 
siderable difference in the interpretation of the test statistic. CO 


When testing hypotheses, p-values and significance levels are calculated 
assuming the null hypothesis to be true. In other situations, there is no 
known population distribution from which to simulate. For such situations, a 
technique called the bootstrap (see [33] for thorough coverage of this subject) 
may help. The key is to use the empirical distribution from the data as 
the population from which to simulate values. Theoretical arguments show 
that at least asymptotically the bootstrap estimate will converge to the true 
value. This is reasonable because as the sample size increases the empirical 
distribution becomes more and more like the true distribution. The following 
example shows how the bootstrap works and also indicates that, at least in 
the case illustrated, it gives a reasonable answer. 


Example 17.11 A sample (with replacement) of size 3 from a population 
produced the values 2, 3, ‘and 7. Determine the bootstrap estimate of the 
mean-squared error of the sample mean as an estimator of the population 
mean. 


The bootstrap approach assumes that the population places probability 4 
on each of the three values 2, 3, and 7. The mean of this distribution is 4. 
From this population there are 27 samples of size 3 that might be drawn. 
Sample means can be 2 (sample values 2, 2, 2, with probability +), z (sample 
values 2, 2, 3, 2,3,2, and 3,2,2, with probability 3), and so on, up to 7 with 
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probability $. The mean-squared error is 


(2 — 4)2(1/27) + (4 — 4)?(8/27) +--+ (7—4)? | 14 
27 ~ 9" 


The usual approach is to note that the sample mean is unbiased and therefore 
MSE(X) = Var(X) = o?/n. 


With the variance unknown, a reasonable choice is to use the sample variance. 
With a denominator of n, for this example, the estimated mean-squared error 
is 
3[(2—4)? + (3-4)? +(7-4)7] _ 4 
3 9’ 
the same as the bootstrap estimate. oO 


In many situations, determination of the mean-squared error is not so easy, 
and then the bootstrap becomes an extremely useful tool. While simulation 
was not needed for the example, note that an original sample size of 3 led to 
27 possible bootstrap values. Once the sample size gets beyond 6, it becomes 
impractical to enumerate all the cases. In that case, simulating observations 
from the empirical distribution becomes the only feasible choice. 


Example 17.12 In Example 11.3 an empirical model for time to death was 
obtained. The empirical probabilities are 0.0333, 0.0744, 0.0343, 0.0660, 0.0344, 
and 0.0361 that death is at times 0.8, 2.9, 3.1, 4.0, 4.1, and 4.8 respectively. 
The remaining 0.7215 probability is that the person will be alive five years 
from now. The expected present value for a five-year term insurance policy 
that pays 1,000 at the moment of death is estimated as 


1000(0.0333v°8 + --- + 0.0361u*8) = 223.01, 


where v = 1.077}. Simulate 10,000 bootstrap samples to estimate the mean- 
squared error of this estimator. 


A method for conducting a bootstrap simulation with the Kaplan—Meier 
estimate is given by Efron [31]. Rather than simulate from the empirical dis- 
tribution (as given by the Kaplan—Meier estimate), simulate from the original 
sample. In this example, that means assigning probability i to each of the 
original observations. Then each bootstrap observation is a left-truncation 
point along with the accompanying censored or uncensored value. After 40 
such observations are recorded, the Kaplan-Meier estimate is constructed 
from the bootstrap sample and then the quantity of interest computed. This 
is relatively easy because the bootstrap estimate can place probability only 
at the six original points. ‘Ten thousand simulations were quickly done. The 
mean was 222.05 and the mean-squared error was 4,119. Efron also noted 
that the bootstrap estimate of the variance of §(é) is asymptotically equal to 
Greenwood’s estimate, thus giving credence to both methods. a 
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17.2.5 Exercises 


17.7 (*) Insurance for a city’s snow removal costs covers four winter months. 
There is a deductible of 10,000 per month. Monthly costs are independent 
and normally distributed with u = 15,000 and o = 2,000. Monthly costs 
are simulated using the inversion method. For one simulation of a year’s 
payments the four uniform pseudorandom numbers are 0.5398, 0.1151, 0.0013, 
and 0.7881. Calculate the insurer’s cost for this simulated year. 


17.8 (*) After one period, the price of a stock is X times its price at the 
beginning of the period, where X has a lognormal distribution with p = 0.01 
and o = 0.02. The price at time 0 is 100. The inversion method is used to 
simulate price movements. The pseudouniform random numbers are 0.1587 
and 0.9332 for periods 1 and 2. Determine the simulated prices at the end of 
each of the first two periods. 


17.9 (*) You have insured 100 people, each age 70. Each person has prob- 
ability 0.03318 of dying in the next year and the deaths are independent. 
Therefore, the number of deaths has a binomial distribution with m = 100 
and g = 0.03318. Use the inversion method to determine the simulated num- 
ber of deaths in the next year based on u = 0.18. 


17.10 (*) For a surplus process, claims occur according to a Poisson process 
at the rate of two per year. Thus the time between claims has the exponential 
distribution with 8 = 2. Claims have a Pareto distribution with a = 2 and 
8 = 1,000. The initial surplus is 2,000 and premiums are collected at a rate 
of 2,200. Ruin occurs any time the surplus is negative, at which time no 
further premiums are collected or claims paid. All simulations are done with 
the inversion method. For the time between claims, use 0.83, 0.54, 0.48, and 
0.14 as the pseudorandom numbers. For claim amounts use 0.89, 0.36, 0.70, 
and 0.61. Determine the surplus at time 1. 


17.11 (*) You are given a random sample of size 2 from some distribution. 
The values are 1 and 3. You plan to estimate the population variance with 
the estimator [(X; — X)? + (Xə — X)?]/2. Determine the bootstrap estimate 
of the mean-squared error of this estimator. 


17.12 A sample of three items from the uniform(0,10) distribution produced 
the following values: 2, 4, and 7. 


(a) Calculate the Kolmogorov—Smirnov test statistic for the null hy- 
pothesis that the data came from the uniform(0,10) distribution. 


(b) Simulate 10,000 samples of size 3 from the uniform(0,10) distribu- 
tion and compute the Kolmogorov—Smirnov test statistic for each. 
The proportion of times the value equals or exceeds your answer 
to part (a) is an estimate of the p-value. 
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17.13 A sample of three items from the uniform(0, 9) distribution produced 
the following values: 2, 4, and 7. Consider the estimator of 0, 


6= $ max(a1, 2,23). 


From example 9.15 the mean-squared error of this unbiased estimator was 
shown to be 6/15. 


(a) Estimate the mean-squared error by replacing 0 with its estimate. 


(b) Obtain the bootstrap estimate of the variance of the estimator. (It 
is not possible to use the bootstrap to estimate the mean-squared 
error because you cannot obtain the true value of 0 from the em- 
pirical distribution, but you can obtain the expected value of the 
estimator.) 


Appendix A 


An inventory of 
continuous distributions 


A.1 INTRODUCTION 


Descriptions of the models are given below. First a few mathematical prelimi- 
naries are presented that indicate how the various quantities can be computed. 
The incomplete gamma function! is given by 


Tr(a; z) = a ted, a>0,2>0 


with T (a) = : tetet dt, a>0. 
0 


1Some references, such as [3], denote this integral P(a,x) and define T(a,z) = 
Je? t°-1e~* dt. Note that this definition does not normalize by dividing by Tr(a). When 
using software to evaluate the incomplete gamma function, be sure to note how it is defined. 
Loss Models: From Data to Decisions, Second Edition. 

By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 

ISBN 0-471-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 
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Also, define z 
G(a;z) = f tlet dt, x>0. 
T 
At times we will need this integral for nonpositive values of œ. Integration by 


parts produces the relationship 


G(a; x) = e 


+ tala +1;2). 
a 


This can be repeated until the first argument of G is a+ k, a positive number. 
Then it can be evaluated from 


Gla + k;x) =T(a+k)[l-T(a+k;2)}. 


However, if a is a negative integer or zero, the value of G(0; x) is needed. It 
is 


G(0; 2) = | t-te dt = E(x), 


which is called the exponential integral. A series expansion for this integral 
is a 


E, (£) = —0.57721566490153 — In z D = n(n!) 


n=1 


When a is a positive integer, the incomplete gamma function can be eval- 
uated exactly as given in the following theorem. 


Theorem A.1 For integer a, 
eS aie-® 


T(a;2)=1- > F 


j=0 


Proof: For a = 1, P(i;z) = fy e™tdt = 1 — 7, and so the theorem is 
true for this case. The proof is completed by induction. Assume it is true for 
a=1,...,n. Then 


T(n+ljz) = ath et dt 


T 
= =(- -trela +f nettet dt) 
n! 0 


ss Z (=s me) + T(n; 2) 


ll 
pt 
| 
hae 
8 
&. 
So. 
ba | 
8 
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The incomplete beta function is given by 


T(a +b) 


SL | Hdt, a>0,b>0,0<2<1, 
rare) h E l 


B(a, b; x) = 


and when b < 0 (but a > 1+ |—b]), repeated integration by parts produces 


ze-1(1 — r)? 

POB ba) = -rata E 
(a e 1)x2-2(1 = goth 
(BGI) 


+ 


(a— d eae 
b(o+1)---(o+7r) 
b+ 1)--- +7) 
xT(b+r+1)6(a-—r—-1,b+7r+1;2), 


+ T(a—r-—1) 


where r is the smallest integer such that b+r +1 > 0. The first argument 


must be positive (that is, a—r—1> 0). 

Numerical approximations for both the incomplete gamma and the incom- 
plete beta function are available in many statistical computing packages as 
well as in many spreadsheets because they are just the distribution functions 
of the gamma and beta distributions. The following approximations are taken 
from [3]. The suggestion regarding using different formulas for small and large 
x when evaluating the incomplete gamma function is from [107]. That refer- 
ence also contains computer subroutines for evaluating these expressions. In 
particular, it provides an effective way of evaluating continued fractions. 

For z < aœ + 1 use the series expansion 


gr 


ee oper (a@+1)-+-(a+n) 


while for z > a +1 use the continned-fraction expansion 


gee? 1 
a eaS 
i Les ies 
a 2— 4a 
2 
1 


The incomplete gamma function can also be used to produce cumulative prob- 
abilities from the standard normal distribution. Let &(z) = Pr(Z < z), 
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where Z has the standard normal distribution. Then, for z > 0, ®(z) = 
0.5 +1°(0.5; z?/2)/2 while, for z < 0, (z) = 1 — ®(—z). 
The incomplete beta function can be evaluated by the series expansion ` 


T'(a+b)22(1 —2)° 
aT (a)T (b) 


co 
«he E tet baled 54m) ae 
n=O 


(a, b; x) 


(a+1)(a+2)---(a+n+1) 


The gamma function itself can be found from 


In(Qx 
Wa) = (a—$)ma-at c ) 
is 1 _ 1 na 1 1 + 1 7 691 
12a 36003 1,260a5 1,68007 1,188a9  360,360a1! 
1 3,617 43,867 174,611 


+ T56al3 122400015 * 244,188017 125,4000 
For values of a above 10 the error is less than 10719. For values below 10 use 


the relationship 
2 nT (a) =lnT(a +1) —-Ina. 


The distributions are presented in the following way. First the name is given 
along with the parameters. Many of the distributions have other names, which 
are noted in parentheses. Next the density function f(z) and distribution 
function F(x) are given. For some distributions, formulas for starting values 
are given. Within each family the distributions are presented in decreasing 
order with regard to the number of parameters. The Greek letters used are 
selected to be consistent. Any Greek letter that is not used in the distribution 
means that that distribution is a special case of one with more parameters 
but with the missing parameters set equal to 1. Unless specifically indicated, 
all parameters must be positive. 

Except for two distributions, inflation can be recognized by simply inflating 
the scale parameter 0. That is, if X has a particular distribution, then cX 
has the same distribution type, with all parameters unchanged except 0 is 
changed to c0. For the lognormal distribution, p changes to u + In(c) with o 
unchanged, while for the inverse Gaussian both p and @ are multiplied by c. 

For several of the distributions, starting values are suggested. They are not 
necessarily good estimators, just places from which to start an iterative proce- 
dure to maximize the likelihood or other objective function. These are found 
by either the methods of moments or percentile matching. The quantities 
used are: 


ie ee 
Moments: m= —) t, t=—) m, 


i=l] i=l 
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Percentile matching: p= 25th percentile, q = 75th percentile. 


For grouped data or data that have been truncated or censored, these 
quantities may have to be approximated. Because the purpose is to obtain 
starting values and not a useful estimate, it is often sufficient to just ignore 
modifications. For three- and four-parameter distributions, starting values 
can be obtained by using estimates from a special case, then making the new 
parameters equal to 1. An all-purpose starting value rule (for when all else 
fails) is to set the scale parameter (0) equal to the mean and set all other 
parameters equal to 2. 

All the distributions listed here (and many more) are discussed in great 


detail in [73]. In many cases, alternatives to maximum likelihood estimators 
are presented. 


A.2 TRANSFORMED BETA FAMILY 


A.2.1 Four-parameter distribution 
A.2.1.1 Transformed beta—a, 0, Y, T (generalized beta of the second kind 
Pearson Type VI) l 
— Petr) ye/0r 
f(z) , 
P(a)E(r) x[1 + (x/0)7]e+ 
7 
Fa) = brau), u= E 


i 1+ (£/0)7? 

E[X*] = ares) —TY<k<ay, 
aaj = PTEE k OPOE e+ k/a blu) 
tatli- F(a), k> -ry, 

Mode = 9 23)" ry > 1, else 0. 


A.2.2 Three-parameter distributions 
A.2.2.1 Generalized Pareto—a, 0,7 (beta of the second kind) 


T(a+r) 0% 


= ETT 


a 
E 
ll 


T 
A(T, œu), u= Fo 
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Mode 


OT (7 +k) (a — k) 
rarr) 

E bine +1) (r+k-1) if k is an integer 
= CEES if k is an integer, 


PFT (7 + kT (a — k) 


—-T<k<a, 


B(T +k,a— kyu), 


P(a)P(7) 
.+a5[1-F(s)], k>-T, 
= A T > 1, else 0. 
a+l 


A.2.2.2 Burr—a,@, (Burr Type XII, Singh-Maddala) 


Mode = 


ay(x/@)7 
a[l + (1/0) 
1 
~ IF (E0) 
OT (1 +k/y)l(a— k/7) 
Tr(a) i 
eT + k/a w RIN) ga 4 k/y a- k/7; {= u) 
Tr(a) f j 
+u, k>-7, 


1/7 
y-1 
—_——— > 1, else 0. 
(25) , 7 else 


1—u*, u 


—y<k<ay, 


A.2.2.3 Inverse Burr—r,@,-y (Dagum) 


f(z) 
F(z) 
E[X*] 


E[(X ^ 2)*] 


Mode 


= u, Uu= 


Ty(2/8) 
a[l + (1/0) 
P (1/0) 
1+ (2/0) 
OL (T + k/y)E(L — k/y) 


= a, ry kg, 


T(r) i 
OT (7 +k/V)E( — k/7) 


= Br + ky, 1 kyu) 


T(r) 
+a*[1 -u"|, k>-rT7, 


1N Vr 
= p (25) , Ty>1, else 0. 
“Cyl 
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A.2.3 Two-parameter distributions 
A.2.3.1 Pareto—œ,0 (Pareto Type II, Lomax) 


a0“ 
f(z) (a+ (ett? 


F(z) = 1- (4): 


OFT (k +1)T(a—k) 
Ta) : 
6* kl 

(a-—1)--- (œ — k) 


8 8 &a—l 
E[X ^z] => = [- (4) | a#l, 


II 


-l<k<a, 


if k is an integer 


0 
EIXAz] = — oa = 
bag] = -om(—55), a=, 
6'T(k+1)l(a—k 
BC Way Se ee a eet) 
Tr(a) 
[ea 
+ Z¢ ) , allk, 
zr+é6 
Mode = 0, 
r t-m? ` mt 
po a 2 = 
es t— 2m2’ G t — 2m2 


A.2.3.2 Inverse Pareto—r, 0 


rÂxzT! 
f(z) = CETA 


a) G + 5) i 
OFT (T +K)E(1— k) 


ELX*] = EFE 
[X"] T(r) , —7<k<1, 
6*(—k)! 
ELX* = ae . : 
LX] Gone if k is a negative integer, 
z/(x--0) 
BUX Aah] = orf y y) ey 
0 
shay) mr 
z+é 
Mode = 02—— 


gt. > 1, else 0. 


633 


634 AN INVENTORY OF CONTINUOUS DISTRIBUTIONS 


A.2.3.3 Loglogistic—y,@ (Fisk) 


y(x/0)7 
O = A eE 
_ _(x/0)" 
F(z) = u, “=I EAT 
E[X*] = OTO +k/y)E—k/7), —y<k< y, 
E(X Az)] = OT +k/VEU -k/7) 80 + k/y,1- k/u) 
+a2°(1—u), k>-y7, 
_ 1/7 
Mode = a(275) » yP 1, else 0, 
~ _  2(3) 5 = exp (+) 
ae eae a ee) 


A.2.3.4 Paralogistic—a,@ This is a Burr distribution with y = a. 


f=) =. 


F(s) = 
E[X*] = 


E[(X A x)"] 


Il 


Mode = 


(use œ) distributions. 


a? (x/0)* 
EHET 


1 
l-u", u 


T(a) 


OT(1+k/a)l(a — k/a) 


T(a) 
+a¥ue, k>a, 


1/a 
&œa—1 

>1, 
(5) , @ 


Starting values can use estimates from the loglogistic (use y for a) or Pareto 


~ T+ (2/8)@ 
OT (1+k/a)l(a — k/a) 


, -a<k<a’, 


BA +k/a,a—k/a;1—u) 


else 0 
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A.2.3.5 Inverse paralogistic—r,@ This is an inverse Burr distribution with 


Y=T. 


f(z) 
F(z) = 
E[X"] = 


E(XAz)*] = 


Mode = 


TED 
afl + (s/0) +? 
ut u= EOY 
: 1+ (x/0)*’ 

OT (T + k/T)E(L— k/r) 
T(r) i 

OT (7 +k/T)E(L-— k/7) 
I(r) 

+a], k>-7?, 

O(7-1)"", 


-r <k<r, 


T> 1, else 0. 


P(T +k/7,1—k/7;u) 


Starting values can use estimates from the loglogistic (use y for T) or inverse 
Pareto (use 7) distributions. 


A.3 TRANSFORMED GAMMA FAMILY 


A.3.1  Three-parameter distributions 


A.3.1.1 Transformed gamma—a.,0,7 (generalized gamma) 


f(z) 


E[X*] 


E[(X A z)*] 


Mode 


Ture ™ a 
= PIOR u = (x/0Y, 
= T(a;u), 
P OT (a +k/r) : us 
= Ta)” k > —a1 3 
k Gr 
= e Tat k A Ta + k/7;u) 


+2*(1—[(a;u)], k>-—ar, 


= l/r 
= (= =) , aT >I], else 0. 


f 
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A.3.1.2 Inverse transformed gamma—a,@,7 (inverse generalized gamma) BL] T oT( je k) on 
enw Tr(a) 
Ture 
Se Se eee DTG NT k 
F(z) ar (a) iea) E[X*] = o c if k is an integer, 
F(x) = 1-—-TI(a;u), (a= n 
k Zh k Pn 0 T(a ei k = NE A k s 
eae e oe o O(a kiba) ey ; 
E[(XA2)"] = STe ED -T(@æ — k/r;u)] + 2°T (o; u) = — a) +7 T(o;8/2), all k, 
i ; Mode = @/(a+1) 
k opin : 
= o Gla — k/T;u) 4 zT (a; u), all k, 24 — m2 N mt 
Tr(a) = =) d= 5 
1/7 t— m4 t—m 
7 
Mode = @ (= z z) : A.3.2.3 Weibull—9, r 
A.3.2 Two-parameter distributions 
A.3.2.1 Gamma—q, 0 fe) = LA 
i ap—2/8 —(2/)* 
f(z) a ei ; F(z) = l—e (z/8) : 
ar (a) E[X*] = oT +k/r), k>-1, 
F(z) = oe E(XAz)*] = OTO +k/TE + k/7; (2/0 ] + ate, k> 7, 
; T(a+k 
k _ a = ae l/r 
BEIS Tr(a) ’ Eai Mode = 9 ( = =) , 7>1, else 0, 
k = k le Subs : AE . 
ELX] = 6*(a+k-—1)---a ifk is an integer b= ep (sts) — la) m In(In(4)) 
gra In(In(4/3))’ 
oer a In(In(4)) 
E[((XAc)*] = ere +k2/@)+2*[1-—T(a;2/6)], k>a In(q) — In(@) 
E(XAz)*] = a(a+1)---(atk—1)0°T(a+k;2/6) A.3.2.4 Inverse Weibull—0,7 (log-Gompertz) 
+a*(1—T(a;2/6)| if k is an integer, 
M(t) = (1-6t)"%, t<1/8, 
Mode = @(a—1), a>l, else 0, 7(6/x)? e~ (0/2) 
a ee ene fa) = “Oe 
ao = t m?’ 08 = m x F(2) z e/a) 
ky) — gẹ zi A 
A.3.2.2 Inverse gamma—&,0 (Vinci) E[X 1] z ae k/t), EET 
E(XAz)"] = OT —k/r){1-T[l — k/7; (6/2)"}} 
. (6/a)%e~*/* a ak [1 — e-O/2)" 
f(z) ar (a) 73 | | i n 
F(z) = 1-T(a;6/2) = GGU- k/7;(8/2)"| + 2% [1—70], alk, 
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Mode 


lI 


m l/r 
g (=) 


2 exp ( Z2@=— ne) _ _in(n(4)) 
ad = ( a ): in(in(4/3))’ 
md) 
mÊ) in(p) 


A.3.3 One-parameter distributions 


A.3.3.1 Exponential-—-0 


fa) = 

F(z) = 
E[X*] = 
E[X"] = 
E[X Az] = 
E[((XAz)*] = 
-E((XAz)*] = 
M(t) = 
Mode = 

6 = 


e-2/8 

J’ 
1 — e729, 
OT(k+1), k>-1, 
6*k! if k is an integer, 
be), 


OT(k+ DI (k+1;2/0)+2%e 7", k>-1, 


OF kIT (k + 1; 2/0) + zee */? if k > —1 is an integer, 
(1—6t)*, t<1/8, 

0, 

m. 


A.3.3.2 Inverse exponential—O 


f(z) 

F(z) 

E[X*] 
E{(X Az)*] 
Mode 

6 


Oe—9/2 


a? 

eWO/ 2, 

= @T(i—k), k<1, 

6*G(1 — k;0/x) +x" (1— e°), all k, 
9/2, 


~q In(3/4). 


A.4 OTHER DISTRIBUTIONS 


A.4.1.1  Lognormal—p,o (p can be negative) 


fle) = p) alor), 2= 2H, 
Fa) = (2), 


A.4.1.2 Inverse 


ig) = 


F(z) = 
EX] = 


E[X Az] = 


M(t) = 


à = 
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exp (ku + $k°o") , 

Ing — p— ko? 
= exp (ku t+ 4$k’o") o( Sebo i 
= exp(y ae 0°), 


Vult) —2In(m), fi = In(m) — 46”. 


Gaussian—L, 8 


ll 


z— pz® f (2)"| — py exp(20/p)® -s (£) : 


A.4.1.3  log-t—r, 1,0 (p can be negative) Let Y have a t distribution 
with r degrees of freedom. Then X = exp(oY + p) has the log-t distribution. 
Positive moments do not exist for this distribution. Just as the t distribution 
has a heavier tail than the normal distribution, this distribution has a heavier 
tail than the lognormal distribution. 


r+1 
r() 


f(z) 


cover (5) +2 (E 


in aes 
F, ( nee) with F,(¢) the cdf of a t distribution with r d.f., 


GF? 
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1, ,ļr 1 r 
=f | ~, =; ——— | , 0<ar<e, 
2° |22 r+ (Meme) 
F(x) = 
1 jir 1 r 
p= 58 Dp? 5 T>e! 
2- 1222 Sn Ing —p 
o 
A.4.1.4 Single-parameter Pareto—ca, 0 
a 
fues eae ME 
8 Q 
F(z) = 1- (2) , T>9, 
k 
k ad ; 
ELX*] aie k<a, 
k Q 
k að k 
E(X A z) ] = a-—k (a ae k)xo-k? 
Mode = 9, 
pa LL 
% = m8 


Note: Although there appears to be two parameters, only a is a true 
parameter. The value of 0 must be set in advance. 


A.5 DISTRIBUTIONS WITH FINITE SUPPORT 


For these two distributions, the scale parameter @ is assumed known. 


A.5.1.1 Generalized beta—a, b, 0, T 


f(z) ono — upc, 0<z<0, u=(2/0)’, 
F(z) = B(a,b;u), 

: OT (a+ b)l(a+k/r) 
E(x") Tiayt(a-+b+k/r) ° 
OT (a+b)l(a+k/7) 
T(a)r(a+b+k/r) 


Il 


k > =ar, 


E[(X A a)*| pla + k/r,b; u) + 2*[1 — b(a, b; u)]. 
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A.5.1.2  beta—a, b,@ 


P(a+) Vo —u al xz U=L 
f(t) = Tare” j? =, 0<g<9, 10, 
F(z) = A(a,b;u), 
: GTa +b) atk) p a 
EX’) = Tarator 7TA 


: i Gala +1) (a+k—1) ONE 
E[X*] = (a+b)(a+b41)--(@+b+k—1) if k is an integer, 
i ala +1) (a+k-—1) a 
E(X Az)*] = Gree ea + k, b; u) 
tak [i = B(a, b; u)l, 
@m?—mt 5 _ (¢m~t)(0 ~m) 
bt — Om?’ +: 6t—Om?2 ` 


Appendix B 


An inventory of 
discrete distributions 


B.1 INTRODUCTION 


The 16 models fall into three classes. The divisions are based on the algo- 
rithm by which the probabilities are computed. For some of the more familiar 
distributions these formulas will look different from the ones you may have 
learned, but they produce the same probabilities. After each name, the para- 
meters are given. All parameters are positive unless otherwise indicated. In 
all cases, pp is the probability of observing k losses. 

For finding moments, the most convenient form is to give the factorial 
moments. The jth factorial moment is up) = E[N(N —1)---(N =- j +1). 
We have E[N] = uq) and Var(N) = py) + Ha) ~ Hay 

The estimators which are presented are not intended to be useful estima- 
tors but rather for providing starting values for maximizing the likelihood (or 
other) function. For determining starting values, the following quantities are 


* 


Loss Models: From Data to Decisions, Second Edition. 
By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 
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used [where ny is the observed frequency at k (if, for the last entry, ng rep- 
resents the number of observations at k or more, assume it was at exactly k) 
and n is the sample size]: 


When the method of moments is used to determine the starting value, a 
circumflex (e.g., Â) is used. For any other method, a tilde (e.g., A) is used. 
When the starting value formulas do not provide admissible parameter values, 
a truly crude guess is to set the product of all A and @ parameters equal to 
the sample mean and set all other parameters equal to 1. If there are two À 
and/or 8 parameters, an easy choice is to set each to the square root of the 
sample mean. 
The last item presented is the probability generating function, 


P(z) =E[z%). 


B.2 THE (a, b,0) CLASS 


The distributions in this class have support on 0,1,.... For this class, a 
particular distribution is specified by setting po and then using py = (a+ 
b/k)pz1- Specific members are created by setting po, a, and b. For any 
member, pp) = (a+6)/(1—a), and for higher j, p) = (aj +b)ug-1)/(1— a). 
The variance is (a + b)/(1 —a)?. 


B.2.1.1  Poisson—xX 


po = e^, a=0, b=)4, 
e^)" 
Pk = l ’ 
E[N] = A, Var[N] =A, 
A = À, 


P(z) = eC, 


B.2.1.2 Geometric—B 


1 B 
= -——, = —, b=0, 
po ie TIF 
pE 
Pe = BFP 


E[N] = 6, Var[N] = 6(1+ 8), 
CSG 
P(z) = [1-B(z-1))* 


THE (a,b,1) CLASS 645 


This is a special case of the negative binomial with r = 1. 


B.2.1.3. Binomial—q, m, (0 < q < 1, m an integer) 


Wie Sit _ (m+1)q 
Po = (1 —q) p] ET b= 1—q 3 
Pk = (insa k=0,1,...,m, 
E[N] = mg, Var[N] =mq(1— 4), 
å = f/m, 
P(z) = [L+¢(z-1)]” 
B.2.1.4 Negative binomial—B, r 
eo Oe 8 
Po = (1+) , > tae? b= 1+8’ 
r(r +1) (r +k- 1)" 
ia RL + Bye 
E[N] = rp, Var[N] =r8(1+ £), 
£ 6" p Ta 
B aa È l, F es 
Pe) = p-p- 


B.3 THE (a,b, 1) CLASS 


To aie eye a class from the (a, Bs 0) class, the probabilities are denoted 
Pr(N = k) = p™ or Pr(N = oo = pi depending on which subclass is being 
Tprented: For this class, ph! is arbitrary (that is, it is a parameter) and 
then p% or pT is a specified function of the parameters a and b. n ahaegnent 
probabilities are obtained recursively as in the (a,b ,0) class: pM = (a+ 
b/k)pM,, k = 2,3,..., with the same recursion for py There are two sub- 
classes of this iat: “When discussing their OS we often refer to the 
“corresponding” member of the (a, b,0) class. This refers to the member of 
that class with the same values for a and b. The notation p will continue to 
be used for probabilities for the corresponding (a, b, 0) distribution. 


B.3.1 The zero-truncated subclass 


The members of this class have pf = 0 and therefore it need not be estimated. 
These distributions should only be used when a value of zero is impossible. 
The first factorial moment is uq) = (a + b)/[(1 — a)(1 — po)|, where po is the 
value for the corresponding member of the (a, b, 0) class. For the logarithmic 
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distribution (which has no corresponding member), #1) = 8/In(1+-@). Higher 

factorial moments are obtained recursively with the same formula as with the 

(a, 6,0) class. The variance is (a + b)[1 — (a + b + 1)po]/[(1 — a)(1 — po)|?-For 

those members of the subclass which have corresponding (a, b, 0) distributions, 
T 

Py, = Pr/(1 — po). 


B.3.1.1 Zero-truncated Poisson—X 


p = ———, a=0, b=), 


Be, ee 
E[N] = AfQ—e), Var[N] =A[t— (A+ 1)e7™]/(1 — 2), 
A = lo(nå/m), 
ert —] 


B.3.1.2  Zero-truncated geometric—GB 


Cee ee _ 
p= tego 8 Gee 
T pt 
OB 
E[N] = 14+ 8, Var[N] =6(1+ 8), 
B = ĝ-1, 
—1 —1 


1— (1+) 


This is a special case of the zero-truncated negative binomial with r = 1. 


B.3.1.3  Logarithmic—G 


To et tO ones, eg ee 
m = arona 7 e Oo Tee 
et ae 
Pk = K(1 + B)* in(1 + B)’ 
BIN] = p/in(i+), Vatn) = E+E PPA +P) 
Ë = wey or 5 A) 
a ee ED 
a 


This is a limiting case of the zero-truncated negative binomial as r — 0. 
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B.3.1.4  Zero-truncated binomial—q, m, (0 < q < 1, m an integer) 


— mü-) a __ 4a ,_ (mti 
PL — 1-(1—q)™’ a= Lg b 1—q ? 
2 i are) a ga 
DE = oa ear oe k=1,2,...,m, 
ait mg 
ELN] agr 
mag[(1 - q) - (1 -4 + mo) — q)” 
RE i-a- "P l 
TE 
qS m’ i 
` B+- -0-07 
eS Laka)" 
B.3.1.5 Zero-truncated negative binomial—G, r, (r > —1, r # 0) 
T r pea ey p= CDP 
m = sy atO O +P IFG’ 
a os neta) (+b 3 ( B i 
Pe = aay] UFA)’ 
z 7B | 
BIN] = PoFo 


rB[(it+6)-(1+6+7r6)1+6)] 


ve = = 0+arP 
B = an eee aan 
Ê ô“ — pt 
_ fl-é-p)"- 0+)" 
Pe) = Tate 


This distribution is sometimes called the extended truncated negative bi- 
nomial distribution because the parameter r can extend below 0. 


B.3.2 The zero-modified subclass 


A zero-modified distribution is created by starting with a truncated distri- 
bution and then placing an arbitrary amount of probability at zero. This 
probability, på, is a parameter. The remaining probabilities are adjusted 
accordingly. Values of pM can be determined from the corresponding zero- 
truncated distribution as pM = (1—p")pf or from the corresponding (a, b, 0) 
distribution as pM = (1 — p}’)px/(1 — po). The same recursion used for the 
zero-truncated subclass applies. 
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The mean is 1 — pi times the mean for the corresponding zero-truncated 


distribution. The variance is 1 — pf times the zero-truncated variance plus 
pi! (1—pi) times the square of the zero-truncated mean. The probability gen- 
erating function is P™ (z) = pi + (1—p))P(z), where P(z) is the probability 
generating function for the corresponding zero-truncated distribution. 

The maximum likelihood estimator of pi’ is always the sample relative 
frequency at 0. 


B.4 THE COMPOUND CLASS 


Members of this class are obtained by compounding one distribution with 


another. That is, let N be a discrete distribution, called the primary distri- 
bution and let M1, Mo,... be identically and independently distributed with 
another discrete distribution, called the secondary distribution. The com- 
pound distribution is S = Mı +---+My. The probabilities for the compound 
distributions are found from 


k 
1 
Boi ars >t + by/k) fyPr-y 


for n = 1,2,..., where a and b are the usual values for the primary distribution 
[which must be a member of the (a,b, 0) class] and fy is py for the secondary 
distribution. The only two primary distributions used here are Poisson (for 
which po = exp[—A(1 — fo)]) and geometric [for which pp = 1/[1 + 6 — £ foll- 
Because this information completely describes these distributions, only the 
names and starting values are given below. 

The moments can be found from the moments of the individual distribu- 
tions: 


E[S] =E[N]E[M] and Var[S] = E[N] Var[M] + Var[N]E[M]?. 


The probability generating function is P(z) = Pprimary [Peecondary (2)]- 

In the following list the primary distribution is always named first. For the 
first, second, and fourth distributions, the secondary distribution is the (a, b, 0) 
class member with that name. For the third and the last three distributions 
(the Poisson-ETNB and its two special cases) the secondary distribution is 
the zero-truncated version. 


B.4.1 Some compound distributions 


B.4.1.1  Poisson-binomial—A, q, m(0 < q < 1, m an integer) 
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B.4.1.2 Poisson-Poisson—\ı, A2 The parameter À; is for the primary Pois- 
son distribution, and Àx is for the secondary Poisson distribution. This dis- 
tribution is also called the Neyman Type A. 


B.4.ł.3 Geometric—extended truncated negative binomial—ßı, B2, r (r > —1) 
The parameter 6} is for the primary geometric distribution. The last two 
parameters are for the secondary distribution, noting that for r = 0 the sec- 
ondary distribution is logarithmic. The truncated version is used so that the 
extension of r is available. 


By = Bo = vÈ. 
B.4.1.4 Geometric-Poisson—ß, A 
B= = yh. 


B.4.1.5  Poisson-extended truncated negative binomial—A, B, (r > —1,r Æ 
0) When r = 0 the secondary distribution is logarithmic, resulting in the 
negative binomial distribution. 


p = MK—30°+29)-26*-A g hb gnd 
ji(K — 36? + 2A) — (6? — pr)?’ A+)! *B 
or, 
pod poo a OE a 
(6? 5 p?)(no/n) In(no/n) — fu(jeno/n — n/n) 
a2 a a 
~ GO —p X H 
= TN À = Ts 
P= Fase ATA 
where 


1 i 
oer Bp, ant rn 4 203. 
K=—) kn 3A- Yk np + Qi 
k=0 k=0 
This distribution is also called the generalized Poisson—Pascal. 


B.4.1.6 Polya—Aeppli—a, B 


This is a special case of the Poisson—extended truncated negative binomial 
with r = 1. It is actually a Poisson—truncated geometric. 


ny 
fi 
1 
t 
j 
[j 
i 
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B.4.1.7 Poisson-inverse Gaussian—v, 3 


Sayin B= Meo 


This is a special case of the Poisson—extended truncated negative binomial 
with r = —0.5. 


B.5° A HIERARCHY OF DISCRETE DISTRIBUTIONS 


The following table indicates which distributions are special or limiting cases 
of others. For the special cases, one parameter is set equal to a constant to 
create the special case. For the limiting cases, two parameters go to infinity 
or zero in some special way. 


Distribution 


Poisson 


ZT Poisson 
ZM Poisson 
Geometric 


ZT geometric 

ZM geometric 
Logarithmic 

ZM logarithmic 
Binomial 

Negative binomial 
Poisson—inverse Gaussian 
Polya—Aeppli 
Neyman-A 


Is a special case of 


ZM Poisson 


ZM Poisson 


Negative binomial 
ZM geometric 

ZT negative binomial 
ZM negative binomial 


ZM binomial 

ZM negative binomial 
Poisson—ETNB 
Poisson—ETNB 


Is a limiting case of 


Negative binomial, 
Poisson—binomial, 
Poisson-inv. Gaussian, 
Polya—Aeppli, 
Neyman-A 

ZT negative binomial 
ZM negative binomial 
Geometric—Poisson 


ZT negative binomial 
ZM negative binomial 


Poisson—ETNB 


Poisson—ETNB 


Appendix C 


Frequency and severity 
relationships 


Let N? be the number of losses random variable and let X be the severity 
random variable. If there is a deductible of d imposed, there are two ways to 
modify X. One is to create Y”, the amount paid per loss: 


yl 0, X <d, 
~ ) X-d, X>d. 


In this case the appropriate frequency distribution continues to be NZ. 
An alternative approach is to create Y”, the amount paid per payment: 


yP = undefined, X <d, 
~ |) X—d, X>d. 


In this case the frequency random variable must be altered to reflect the 
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number of payments. Let this variable be NP. Assume that for each loss the 
probability is v = 1 — Fy (d) that a payment will result. Further assume that 
the incidence of making a payment is independent of the number of losses. 
Then NP = I, + La +---+ Ly, where L; is 0 with probability 1 — v and 
is 1 with probability v. Probability generating functions yield the following 
relationships: 


NE Parameters for NP 
Poisson \* = vA 
M _ 4A —Ur _ pM p—và 
ZM Poisson 3 = cd A la = vA 
1—e 
Binomial g = ug 
M — q\m m M m 
ZM binomial ale = Po (1 q) T (1 vq) Po (1 vq) 
tog) 
g =v 
Negative binomial B =vß, r* =r 
E (1+ 6)" + (1+ 08)" — pot + vB)" 
ZM _ bi : M+ _ Po ( 0 
M neg. binomial Po 1-0 
B =vB, r*=r 
ZM logarithmic pè =1— (1 — pi’) m(1 + vB)/In(1 + B) 
p“ = vB 


The geometric distribution is not presented as it is a special case of the 
negative binomial with r = 1. For zero truncated distributions, the above 
is still used as the distribution for N? will now be zero modified. For com- 
pound distributions, modify only the secondary distribution. For ETNB sec- 
ondary distributions the parameter for the primary distribution is multiplied 
by 1 — p* as obtained above while the secondary distribution remains zero 
truncated (however, 8* = uf). 

There are occasions in which frequency data are collected which provide 
a model for NP. There would have to have been a deductible d in place 
and therefore v is available. It is possible to recover the distribution for 
NZ, although there is no guarantee that reversing the process will produce a 
legitimate probability distribution. The solutions are the same as above, only 
now v = 1/[1 — Fx (d)]. 

Now suppose the current frequency model is Nt, which is appropriate for 
a deductible of d. Now suppose the deductible is to be changed to d*. The 
new frequency for payments is N and is of the same type. Then use the 
table with v = [1 — Fx(d*)|/[1 — Fx (a)l. 


Appendix D 
The recursive formula 


The recursive formula is (where the frequency distribution is a member of the 
(a,b, 1) class), 


[pı — (a+ b)polfx(a) + Z (a+) f(w)fs(e-v) 
1 —afx(0) i 


where fs(x) = Pr(S = z), x = 0,1,2,..., fx(x) = Pr(X =z), £ =0,1,2,..., 
po = Pr(N = 0), and pı = Pr(N = 1). Note that the severity distribution 
(X) must place probability on non-negative integers. The formula must be 
initialized with the value of fs(0). These values are given in Table D.1. It 
should be noted that, if N is a member of the (a, b, 0) class, pı — (a +b)po = 0 
and so the first term will vanish. If N is a member of the compound class, the 
recursion must be run twice. The first pass uses the secondary distribution 
for po, pı, a, and b. The second pass uses the output from the first pass as 
fx(x) and uses the primary distribution for po, pi, a, and b. 


fs(x) = 
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Table D.1 Starting values (fs(0)) for recursions 


Distribution 


Poisson 
Geometric 
Binomial 


Negative binomial 


ZM Poisson 
ZM geometric 
ZM binomial 


ZM negative binomial 


ZM logarithmic 


' 
| 
| 


fs(0) 
exp[A(fo — 1)] 


[1+ B(1— fo) 
[1+ (fo — 1)|™ 


[1+ CL — fo)” 


off + — 2) So 

ot + (Bate | 
nit a-p tafe a ae = a a a 
arabet- 

pif + (a - pff) {1 - SE EPC AN 


Appendix E 


Discretization of the 
severity distribution 


There are two relatively simple ways to discretize the severity distribution. 
One is the method of rounding and the other is a mean-preserving method. 


E.1 THE METHOD OF ROUNDING 


This method has two features: All probabilities are positive, and the proba- 
bilities add to 1. Let h be the span and let Y be the discretized version of X. 
If there are no modifications, then 


Il 


Pr(Y = jh) = Pr [(j- 5) h 
Fx [(ġ +3) A] — Fx [0 - 3) Al. 


fi 


Il 
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The recursive formula is then used with fx(j) = fj. Suppose a deductible of 


d, limit of u, and coinsurance of a are to be applied. If the modifications are 


to be applied before the discretization, then 
Fx (d+ h/2) — Fx(d) 


1 — Fx(d) : 
Fx {d+ (j +1/2)h] — Fx[d+ (j — 1/2)h] 
95 1— Fx(d) i 
jot... E-a, 
1 — Fx(u — h/2) 
Ju—d)/h = IF 


where g; = Pr(Z = jah) and Z is the modified distribution. This method 
does not require that the limits be multiples of h but does require that u—d be 
a multiple of h. This method gives the probabilities of payments per payment. 

Finally, if there is truncation from above at u, change all denominators to 
Fx(u)— Fx(d) and also change the numerator of gțu—a)/n to Fx(u)—Fx(u— 


h/2). 


E.2 MEAN PRESERVING 


This method ensures that the discretized distribution has the same mean as 
the original severity distribution. With no modifications the discretization is 


E[X Ah 
fo = 1- h iF 
QELX A jh] — ELX A (j — 1)h] — E[X A (j + 1)}] 
fy : - , j=1,2,.... 
For the modified distribution, 
_ q EX Ad+hl-ElX Ad] 
an AL—Fx@] 
= _ QE[X Ad+ jh] —E[X Ad+ G-Dh|—-E[XAd+G4+1)h) 
aa i — Fx(@)] l 
j=l, ea, 
_ E[XAuj—-E[XAu—Al 
sen he 


To incorporate truncation from above, change the denominators to 
h|Fx (u) — Fx (d)] 


and subtract h[1 — Fy (u)] from the numerators of each of go and g(u—d)/h- 


i 
| 
i 
i 


ii haainiontrtasonen 
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E.3 UNDISCRETIZATION OF A DISCRETIZED DISTRIBUTION 


Assume we have go = Pr(S = 0), the true probability that the random variable 
is zero. Let pj = Pr(S* = jh), where S* is a discretized distribution and h 
is the span. The following are approximations for the cdf and LEV of B; 
the true distribution which was discretized as S*. They are all based on the 
assumption that S has a uniform distribution over the interval from (j— 4)h to 
(j+ 4)h for integral j. The first interval is from 0 to h/2, and the probability 
Po — go is assumed to be uniformly distributed over it. Let S** be the random 
variable with this approximate mixed distribution. (It is continuous, except 
for discrete probability go at zero.) The approximate distribution function 
can be found by interpolation as follows. First, let 


j 
Fj = Fs- [(j +4) h] = X pa j =0,1; > 
i=0 


Then, for x in the interval (j — $)h to (j + $)h, 


Pela) = Beat f pp = Bat fe (Ga) 


j-1/2h 
Fy + [e - G— 9) A] AG - Ba) 


Tb 
= (1 — w)Fj-1 + wF}, gag: 
Because the first interval is only half as wide, the formula for 0 < x < h/2 is 
22 
Fg--(t) = (1 — w)go +wpo, w= > 


It is also possible to express these formulas in terms of the discrete proba- 
bilities: 


2 h 
go + [po — go]; O0<2<¥5; 

Fs--(z)= 9 — (j — 1/2)h . . 
Pp + E, (j —4)h < æ < (J + 3)h- 


i=0 


With regard to the limited expected value, expressions for the first and kth 
LEVs are i 


2 h 
z(1 — go) — F- (Po ~ 90), O<2<>5, 
j-1 2 : 2 
B(S**Az)=4 h So IG Ss, 
( ) q (Po — go) + >) ihpi + oh Dj 


i=1 


+a[1 — Fs--(2)], G- 3)h< a< G+ 3)h 
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and, for 0 < z < a 


E[(S** A x)*] = =, (po — go) + z" [1 — Fs--(z)], 


while for (j -$)h<a<(j+ a)h, 


j—1 


klng — kff; k+1 ; _ 1L)k+1 
E[(S** A x)*] 12g) go) D toto = Sa a ae 2) ly; 
kt) (j — l)hjk+l 
EM y, y aft — Fse (a) 


Appendix F 
Numerical optimization 
and solution of systems 
of equations 


Maximizing functions can be difficult when there are many variables. A vari- 
ety of numerical methods have been developed, and most any will be sufficient 
for the tasks set forth in this text. Here we present two options. The first is 
to use the Excel® Solver add-in. It is fairly reliable, though at times it may 
declare a maximum has been found when there is no maximum. A second 
option is the simplex method. This method tends to be slower but is more 
reliable. The final section of this Appendix shows how the solver and goal 


seek routines in Excel\’ can be used to solve systems of equations. 
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F.1 MAXIMIZATION USING SOLVER 


Solver is not automatically available when Excel® is installed. If it is avail- 
able, you can tell because Solver will appear on Excel’s Tools menu. If it 
does not, it must be added in. To do this, select Add-ins from the Tools 
menu, check the Solver box, and then click OK. If Solver does not appear on 
the add-in list, Solver was not installed when Excel’ was installed on your 
machine. This will be the case if a typical (as opposed to full or custom) in- 
stall was done. To install Solver, go to Add/Remove Programs in the Control 
Panel and modify your Microsoft Office® installation. You will not need to 
reinstall all of Office® to add the Solver. 

Use of Solver is illustrated with an example in which maximum likelihood 
estimates for the gamma distribution are found for Data Set B right censored 
at 200. If you have not read far enough to appreciate this example, it is not 
important. 

Begin by setting up a spreadsheet in which the parameters (alpha and 
theta) are in identifiable cells as is the objective function (InL). In this example 
the parameters are in El and E2 and the objective function is in E3.) 


Edit View Insert Format- Tools Data Window- Help 
fe =SUM(C2:C8) i 
| os ee ay 
In alpha 1 =A 
27| 0.000973| -6.93476 theta 1000 i 
82| 0.000921] -6.98976 |InL [ 44.91 251 
115} 0.000891] -7.02276 
126) 0.000882! -7.03376 
155, 0.000856| -7.06276 
161| 0.000851) -7.06876 
200| 0.818731 -2.8 


P f wets | ome | me ; 
SREEEEEEEEERE 


a 
A 


> w\Sheet1 /Sheet2 {Sheets Ja ooo si | 


Es 
ay 


i 


The formulas underlying this spreadsheet are given below. 


1 Screenshots reprinted by permission from Microsoft Corporation. 


sail nna 


MAXIMIZATION USING SOLVER 661 


By File Edit view Insert Format Tools. Data Window’ Help 
2 = z AM : : 
A boo B Geers [es Det Bees 
1 [x ["fG9": In alpha |1 
2 |27 |=GAMMADIST(A2,E$1 ,E$2 FALS =LN(B2) theta |1000 
3 [82 |=GAMMADIST(A3,E$1 E$2,FALS =LN(B3 In. |=SUM(C2:C8 
4 (115 |=GAMMADIST(A4 ,E$1 ,E$2,FALS =LN(B4 j 
5 |126 |=GAMMADIST(AS,E$1 ,E$2,FALS =LN(B5) pe 
155 |=GAMMADIST(AS E91 ,E$2,FALS =LN(B6 a 
161 |=GAMMADIST(A?,ES1 .E$2,FALSE) [=LN(B7) 
200 Z PGAMMADISTE(D EI E2,TRUE)  |=14*LN(B8) 


6 
Z 
8 
9 


ol 
4 > bi\\Sheeti { Sheet2 4 Sheet3 


Ready 


Note that trial values for alpha and theta have been entered (1 and 1,000). 
The better these guesses are, the higher the probability that Solver will suc- 
ceed in finding the maximum. Selecting Solver from the Tools menu brings 
up the following dialog box: 


Set Target Cell: : 
@ Max C Min f Value of: 


$E$1,$E$2 


Subject to the Constraints: = a 


The target cell is the location of the objective function and the By Changing 
Cells box contains the location of the parameters. These cells need not be 
contiguous. It turns out that clicking on Solve will get the job done, but there 
are two additional items to think about. First, Solver allows for the imposition 
of constraints. They can be added by clicking on Add which brings up the 
following dialog box: 
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Constraint: 


Max Time: J100 seconds 


Iterations: 100 i 
Precision: Jo.o00001 Load Model... | 
Tolerance: [5 ey : Save Model... | 


Convergence: Jo.0001 


J” Assume Linear Model IV Use Automatic Scaling 


T~ Assume Non-Negative T~ show Iteration Results’ 
stimates -m Derivatives Search 


® Tangent i Forward i Newton 


The constraint a > 0 has been entered. Solver does not allow the constraint 
we really want, which is a > 0. After entering a similar constraint for 0, the 
5 l € dialo box looks like: irg sereesaaesaceseeseos 
a. : ™ Quadratic ¡Central {™ Conjugate 


Two changes have been made from the default settings. The Use Auto- 
matic Scaling box has been checked. This improves performance when the 
parameters are on different scales (as is the case here). Also, Central approx- 
imate derivatives have been selected. Additional precision in the answer can 
be obtained by making the Precision, Tolerance, and Convergence numbers 
smaller. Clicking OK on the options box (no changes will be apparent in the 
Solver box) and then clicking Solve results in the following: 


Set Target Cell: H 
Equal Ton e. f* Max 
y Changing Cells 


$E$L $E 


j 
PEPA AA NA OOO OE A OAO ariasi haniai a 


J A l 
Change | 
Solver found a solution. All constraints and optimality 


Delete | | conditions are satisfied. Reports 


eed. 


ues 


Cancel Save Scenario... 


The reason adding the constraints is not needed here is that the solution 
Solver finds meets the constraints anyway. Clicking on Options brings up the 


folowing dialog box: Clicking OK gives the answer. 


sialic dasias da asada tnealneonnnAnchiasicininini 
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Ee 
1.7022297 
[229.46973 
-43.648305 


"F009" alpha 
0.00094832| -6.960818|theta 
0.001627994| -5.420407 |InL 

0.0017879| -5.326713 
0.001817122] -6.310502 
0.001852132| -6.291418 
0.001853101| -6.290895 
0.697300093| -5.047552 


Sheet1 {Sheet2 £ Sheet3 / 


_Users of Solver (or any numerical analysis routine) should always be wary 
of the results. The program may announce a solution when the maximum has 
not been found, and it may give up when there is a maximum to be found. 
When the program gives up, it may be necessary to provide better starting 
values. To verify that an announced solution is legitimate (or at least is a 
local maximum), it is a good idea to check the function at nearby points to 
see that the values are indeed smaller. 


F.2 THE SIMPLEX METHOD 


The method (which is not related to the simplex method from operations re- 
search) was introduced for use with maximum likelihood estimation by Nelder 
and Mead in 1965 [98]. An excellent reference (and the source of the partic- 
ular version presented here) is Sequential Simplex Optimization by Walters, 
Parker, Morgan, and Deming [134]. 

Let x bea k x 1 vector and f(x) be the function in question. The iterative 
step begins with k+1 vectors, xi,...,*441, and the corresponding functional 
values, f1,---,fk41. At any iteration the points will be ordered so that fo < 
--- < faa. When starting, also arrange for fı < fo. Three of the points have 
names: x, is called worstpoint, x2 is called secondworstpoint, and x;+1 is 
called bestpoint. It should be noted that after the first iteration these names 
may not perfectly describe the points. Now identify five new points. The first 
one, yi, is the center of X2,...,Xk41, That is, yı = Ss x,/k and is called 
midpoint. The other four points are found as follows: 


| 
i 
i 
| 
$ 


iii aunta ananasa 
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yo2 = 21-1, refpoint, 

y3 = 2y2-Xı, doublepoint, 
ya = (yitye)/2, halfpoint, 
ys = (y1+x1)/2, centerpoint. 


Then let go,...,95 be the corresponding functional values, that is, g} = 
f(y) (the value at yi is never used). The key is to replace worstpoint (x1) 
with one of these points. The decision process proceeds as follows: 


— 


. If fo < gə < fk+1, then replace it with refpoint. 

2. If go > fk+1 and g3 > fp41, then replace it with doublepoint. 
3. If go > fyi and g3 < fk+1, then replace it with refpoint. 

4. If fı < go < fo, then replace it with halfpoint. 

5. If go < fi, then replace it with centerpoint. 


After the replacement has been made, the old secondworstpoint becomes 
the new worstpoint. The remaining k points are then ordered. The one with 
the smallest functional value becomes the new secondworstpoint, and the one 
with the largest functional value becomes the new bestpoint. In practice, 
there is no need to compute y3 and g3 until you have reached step 2. Also 
note that at most one of the pairs (y4, g4) and (ys, gs) needs to be obtained, 
depending on which (if any) of the conditions in steps 4 and 5 hold. 

Iterations continue until the set of k +1 points becomes tightly packed. 
There are a variety of ways to measure that criterion. One example would be 
to calculate the standard deviations of each of the components and then aver- 
age those values. Iterations can stop when a small enough value is obtained. 
Another option is to keep iterating until all k + 1 vectors agree to a specified 
number of significant digits. 


F.3 USING EXCEL® TO SOLVE EQUATIONS 


In addition to maximizing and minimizing functions of several variables, 
Solver can also solve equations. By choosing the Value of: radio button in 
the Solver dialog box, a value can be entered and then Solver will manipulate 
the By Changing Cells in order to set the contents of the Target Cell equal to 
that value. If there is more than one function, the constraints can be used to 
set them up. The following spreadsheet and Solver dialog box are set up to 
solve the two equations z+ y = 10 and x — y = 4 with starting values z = 8 
and y = 5 (to illustrate that the starting values do not have to be solutions 
to any of the equations). 
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The Solver dialog box is: 


-Set Target Cell: 
| Equal To: Max © 

_ By Changing Cells: — —— 
| ($8$1,$8$2 


Subject to the Constraints: — 


The solution is: 


e Edit View Insert Format 
Window Help ‘Acroba - 


When there is only one equation with one unknown, the Goal Seek tool 
in Excel® is easier to use. It is on the Tools menu and is most always 
installed with the standard installation process. Suppose we want the solution 
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of ze? = 10. The following simple spreadsheet sets up the problem with a 


starting value of x = 2. 


x 2 


equation [ 14.7781 il 
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The Goal Seek dialog box is: 
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The solution is: 
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joint distribution, 362 
loss function, 364 
marginal distribution, 362 
model distribution, 361 
posterior distribution, 362 
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Beta distribution, 641 
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estimation, 389 
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Continuous-time process, 210 
Convolution, 140 
numerical, 474 
Copula, 403 
Counting distributions, 72 
Covariates 
models with, 405 
proportional hazards model, 406 
Cox proportional hazards model, 407 
Cramér’s asymptotic ruin formula, 244 
Credibility 
Biihlmann credibility factor, 558 
expected hypothetical mean, 557 
expected process variance, 557 
fully parametric, 590 
greatest accuracy, 517, 542 
Bayesian, 545 
Biihlmann, 557 
Bihlmann-Straub, 560 
exact credibility, 566 
fully parametric, 602 
linear, 553 
linear vs. Bayes, 569 
log-credibility, 573 
nonparametric, 592 
semiparametric, 600 
hypothetical mean, 557 
limited fluctuation, 516, 530 
full credibility, 532 
partial credibility, 535 
nonparametric, 590 
partial, 535 
process variance, 557 
semiparametric, 590 
variance of the hypothetical means, 557 
Credibility factor, 535 
Cubic spline, 489 
Cumulative distribution function, 13 
Cumulative hazard rate function, 289 


D 


Data-dependent distribution, 45, 284 
Deductible 
effect of inflation, 122 
effect on frequency, 129 
franchise, 118 
ordinary, 116, 297 
Delta method, 356 
Density function, 17 
Density function plot, 423 
Difference plot, 424 
Digamma function, 588 
Discrete distribution, 72 
Discrete Fourier transform, 185 
Discrete random variable, 16 
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Discrete time process, 211, 215 
Distribution 
(a, b, 0) class, 81, 644 
(a, b, 1) class, 83, 645 
aggregate loss, 137 
beta, 641 
binomial—beta, 102 
binomial, 79, 645 
bivariate, 402 
Burr, 632 
claim count, 137 
compound, 141 
moments, 142 
compound frequency, 88, 648 
recursive formula, 92 
compound Poisson frequency, 95 
conditional, 518 
conjugate prior, 373 
copula, 403 
counting distributions, 72 
data-dependent, 45, 284 
defective, 260 
discrete, 72 
empirical, 284 
exponential, 638 
exponential dispersion family, 581 
extended truncated negative binomial 
(ETNB), 87 
frailty, 62 
frequency, 137 
gamma, 48, 58, 636 
generalized beta, 640 
generalized Pareto, 631 
generalized Poisson—Pascal, 649 
generalized Waring, 103, 379 
geometric-ETNB, 649 
geometric—Poisson, 649 
geometric, 77, 644 
improper prior, 361 
individual loss, 137 
infinitely divisible, 104 
inverse Burr, 632 
inverse exponential, 638 
inverse gamma, 636 
inverse Gaussian, 48, 639 
inverse paralogistic, 635 
inverse Pareto, 633 
inverse transformed, 57 
inverse transformed gamma, 71, 635 
inverse Weibull, 58, 637 
joint, 362, 518 
k-point mixture, 43 
kernel smoothed, 284 
linear exponential family, 371 
logarithmic, 87, 646 
loglogistic, 66, 634 
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lognormal, 59, 71, 638 
log-t, 639 
Makeham, 458 
marginal, 362, 518 
mixed frequency, 101 
mixture, 519 
mixture/mixing, 43, 59 
negative binomial, 76, 645 
as Poisson mixture, 78 
extended truncated, 87 
negative hypergeometric, 102 
Neyman Type A, 90, 649 
one-sided stable law, 261 
paralogistic, 634 
parametric, 41, 284 
parametric family, 42 
Pareto, 633 
Poisson—binomial, 648 
Poisson—inverse Gaussian, 650 
Poisson—Poisson, 90, 649 
Poisson—ETNB, 649 
Poisson—extended truncated negative 
binomial, 398 
Poisson—inverse Gaussian, 398 
Poisson—logarithmic, 94 
Poisson, 73, 644 
Polya—Aeppli, 649 
Polya~Eggenberger, 102 
posterior, 362 
predictive, 362, 545 
prior, 360 
scale, 41 
Sibuya, 88 
single parameter Pareto, 640 
spliced, 64 
tail weight, 48 
transformed, 57 
transformed beta, 66, 631 
transformed beta family, 69 
transformed gamma, 70, 635 
transformed gamma family, 69 
variable-component mixture, 44 
Waring, 103, 379 
Weibull, 58, 637 
Yule, 103, 379 
zero-modified, 85, 647 
zero-truncated, 85 
zero-truncated binomial, 647 
zero-truncated geometric, 646 
zero-truncated negative binomial, 647 
zero-truncated Poisson, 646 
zeta, 111, 451 
Distribution function, 13 
empirical, 288 
Distribution function plot, 421 
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Empirical Bayes estimation, 589 
Empirical distribution, 284 
Empirical distribution function, 288 
Empirical model, 27 
Estimate 
interval, 309 
Nelson—Aalen, 290 
Estimation 
(a, b, 1) class, 392 
Bayesian, 360 
binomial distribution, 389 
compound frequency distributions, 396 
credibility interval, 365 
effect of exposure, 398 
empirical Bayes, 589 
maximum likelihood, 337 
multiple decrement tables, 324 
negative binomial, 386 
point, 266 
Poisson distribution, 383 
Estimator 
asymptotically unbiased, 270 
Bayes estimate, 364 
bias, 268 
confidence interval, 275 
consistency, 270 
interval, 275 
Kaplan-Meier, 299 
kernel density, 316 
mean-squared error, 271 
method of moments, 332 
percentile matching, 333 
relative efficiency, 275 
smoothed empirical percentile, 333 
unbiased, 268 
uniformly minimum variance unbiased, 
272 
Exact credibility, 567 
Excess loss variable, 29 
Expectation, conditional, 520 
Exponential distribution, 638 
Exposure base, 108 
Exposure, effect in estimation, 398 


Extrapolation, using splines, 504 


F 


Failure rate, 20 

Fast Fourier transform, 186 
Fisher’s information, 352 
Force of mortality, 20 
Fourier transform, 185 
Frailty model, 62 
Franchise deductible, 118 
Frequency, 137 


effect of deductible, 129 
interaction with severity, 651 
Frequency /severity interaction, 174 
Full credibility, 532 
Function 
characteristic, 105 
cumulative hazard rate, 289 
density, 17 
empirical distribution, 288 
force of mortality, 20 
gamma, 58, 630 
hazard rate, 20 
incomplete beta, 629 
incomplete gamma, 58, 627 
likelihood, 338 
loglikelihood, 339 
loss, 364 
probability, 19, 73 
probability density, 17 
probability generating, 73 
survival, 16 


G 


Gamma distribution, 58, 636 
Gamma function, 58, 630 

incomplete, 627 
Gamma kernel, 318 
Generalized beta distribution, 640 
Generalized linear model, 413 
Generalized Pareto distribution, 631 
Generalized Poisson—Pascal distribution, 

649 

Generalized Waring distribution, 103, 379 
Generating function 

moment, 36 

probability, 36 
Geometric-ETNB distribution, 649 
Geometric—Poisson distribution, 649 
Geometric distribution, 77, 644 
Greatest accuracy credibility, 517, 542 


Greenwood’s approximation, 311 


H 


Hazard rate, 20 
cumulative, 289 
tail weight, 50 
Heckman-Meyers formula, 188 
Histogram, 293 
Hypothesis tests, 277, 427 
Anderson—Darling, 430 
chi-square goodness-of-fit, 432 
Kolmogorov—Smirnov, 428 
likelihood ratio test, 436, 442 
p-value, 280 
significance level, 278 
uniformly most powerful, 279 
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Hypothetical mean, 557 
I 


Incomplete beta function, 629 
Incomplete gamma function, 58, 627 
Independent increments, 210, 224 
Individual loss distribution, 137 
Individual risk model, 136, 192 

moments, 193 

direct calculation, 195 

recursion, 197 
Infinitely divisible distribution, 104 
Inflation 

effect of, 122 

effect of limit, 124 
Information, 352 

observed, 355 
Information matrix, 353 
Interpolation 

modified osculatory, 505 

polynomial, 485 
Interval estimator, 275 
Inverse Burr distribution, 632 
Inverse exponential distribution, 638 
Inverse gamma distribution, 636 
Inverse Gaussian distribution, 48, 639 
Inverse paralogistic distribution, 635 
Inverse Pareto distribution, 633 
Inverse transformed distribution, 57 
Inverse transformed gamma distribution, 

71, 635 
Inverse Weibull distribution, 58, 637 
Inversion method for aggregate loss 
calculations, 161, 184 

Joint distribution, 518 


K 


k-point mixture distribution, 43 
Kaplan-Meier estimator, 299 
large data sets, 322 
variance, 311 
Kernel density estimator, 316 
gamma kernel, 318 
triangular kernel, 318 
uniform kernel, 318 
Kernel smoothed distribution, 284 
Kolmogorov-Smirnov test, 428 
Kurtosis, 27 
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L 


Laplace transform 
for aggregate loss, 142 
Large data sets, 322 
Left censored and shifted variable, 30 
Left censoring, 297 
Left truncated and shifted variable, 29 
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Left truncation, 297 
Likelihood function, 338 
Likelihood ratio test, 436, 442 
Limit 
effect of inflation, 124 
policy, 298 
Limited expected value, 30 
Limited fluctuation credibility, 516, 530 
partial, 535 
Limited loss variable, 30 
Linear exponential family, 371 
Log-t distribution, 639 
Log-transformed confidence interval, 
313-314 
Logarithmic distribution, 87, 646 
Loglikelihood fuction, 339 
Loglogistic distribution, 66, 634 
Lognormal distribution, 59, 71, 638 
Loss elimination ratio, 121 
Loss function, 364 
Lundberg’s inequality, 230 


M 


Makeham distribution, 458 
Marginal distribution, 362, 518 
Markov process, 215 
Maximization, 660 
simplex method, 664 
Maximum aggregate loss, 239 
Maximum covered loss, 126 
Maximum likelihood estimation, 337 
binomial, 390 
inverse Gaussian, 350 
negative binomial, 387 
Poisson, 384 
variance, 385 
trucation and censoring, 341 
Maximum likelihood estimator 
consistency, 352 
unbiased, 352 
Mean, 25 
Mean excess loss, 29 
Mean residual life, 29 
tail weight, 50 
Mean squared error, 271 
Mean, conditional, 520 
Median, 34 
Method of moments, 332 
Mixed frequency distributions, 101 ` 
Mixed random variable, 16 
Mixture distribution, 519 
Mixture/mixing distribution, 43, 59 
Mode, 23 
Model 
collective risk, 135 
empirical, 27 
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individual risk, 136 
Model selection, 3 
graphical comparison, 421 
Schwarz Bayesian criterion, 443 
Modeling, advantages, 5 
Modeling process, 3 
Moment, 25 
Moment generating function, 36 
for aggregate loss, 142 
Moment 
individual risk model, 193 
factorial, 643 
limited expected value, 30 
of aggregate loss distribution, 142 
Mortality table construction, 322 
Multiple decrement tables, 324 


N 


Negative binomial distribution, 76, 645 
as compound Poisson-logarithmic, 94 
as Poisson mixture, 78 
estimation, 386 

Negative hypergeometric distribution, 102 

Nelson—Aalen estimate, 290 

Neyman Type A distribution, 90, 649 

Noninformative prior distribution, 361 

(0) 

Observed information, 355 

Ogive, 293 

Ordinary deductible, 116, 297 

Osculatory interpolation, 505 


P 


p-value, 280 
Paralogistic distribution, 634 
Parameter, 3 

scale, 41 
Parametric distribution, 41, 284 
Parametric distribution family, 42 
Pareto distribution, 633 
Parsimony, 440 
Partial credibility, 535 
Percentile, 34 
Percentile matching, 333 
Plot 

density function, 423 

difference, 424 

distribution funtion, 421 
Point estimation, 266 
Poisson—binomial distribution, 648 
Poisson~ETNB distribution, 398, 649 
Poisson-inverse Gaussian distribution, 398, 
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Poisson-logarithmic distribution, 94 
Poisson distribution, 73, 644 


estimation, 383 
Poisson process, 223 
Policy limit, 124, 298 
Polya—Aeppli distribution, 649 
Polya—Eggenberger distribution, 102 
Polynomial interpolation, 485 
Polynomial, collocation, 485 
Posterior distribution, 362 
Predictive distribution, 362, 545 
Prior distribution, noninformative or 
vague, 361 
Probability density function, 17 
Probability function, 19, 73 
Probability generating function, 36, 73 
for aggregate loss, 141 
Probability mass function, 19 
Process variance, 557 
Process 
Brownian motion, 252 
compound Poisson, 225 
continuous time, 210 
discrete time, 211, 215 
independent increments, 210, 224 
Markov, 215 
Poisson, 223 
stationary increments, 211, 224 
surplus, 212 
Weiner, 253 
white noise, 253 
Product-limit estimator, 299 
large data sets, 322 
variance, 311 
Proportional hazards model, 406 
Pseudorandom variables, 612 
Pure premium, 515 


R 


Random variable 
central moment, 27 
coefficient of variation, 27 
continuous, 16 
discrete, 16 
excess loss, 29 
kurtosis, 27 
left censored and shifted, 30 
left truncated and shifted, 29 
limited expected value, 30 
limited loss, 30 
mean, 25 
mean excess loss, 29 
mean residual life, 29 
median, 34 
mixed, 16 
mode, 23 
moment, 25 
percentile, 34 
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right censored, 31 
skewness, 27 

standard deviation, 27 
support, 16 

variance, 27 


Recursive formula, 653 


aggregate loss distribution, 161 
continuous severity distribution, 655 
for compound freqency, 92 


Recursive method for aggregate loss 


calculations, 161 


Relative efficiency, 275 
Relative security loading, 225 
Right censored variable, 31 
Right censoring, 297 

Right truncation, 297 

Risk model 


collective, 135 
individual, 136, 192 


Risk set, 289, 298 
Ruin 


asymptotic, 244 

continuous time, finite horizon, 214 
continuous time, infinite horizon, 213 
discrete time, finite horizon, 214 
discrete time, infinite horizon, 214 
evaluation by convolution, 216 
evaluation by inversion, 219 
Lundberg’s inequality, 230 

Tijms’ approximation, 244-245 

time to, as inverse Gaussian, 260 
time to, as one-sided stable law, 261 
using Brownian motion, 256 


Ruin theory, 209 


S 


Scale distribution, 41 

Scale parameter, 41 

Schwarz Bayesian criterion, 443 
Security loading, relative, 225 


Severity, interaction with frequency, 651 


Severity/frequency interaction, 174 
Sibuya distribution, 88 
Significance level, 278 
Simplex method, 664 
Simulation, 611 

aggregate loss calculations, 618 


Single-parameter Pareto distribution, 640 


Skewness, 27 
Smoothed empirical percentile estimate, 
333 
Smoothing splines, 505 
Solver, 660 
Spliced distribution, 64 
Splines 
cubic, 489 
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extraplation, 504 

smoothing, 505 
Standard deviation, 27 
Stationary increments, 211, 224 
Stop-loss insurance, 145 
Support, 16 
Surplus process, 212 

maximum aggregate loss, 239 
Survival function, 16 


T 


Tail weight, 48 
Tijms’ approximation, 244-245 
Transformed beta distribution, 66, 631 
Transformed beta family, 69 
Transformed distribution, 57 
Transformed gamma distribution, 70, 635 
Transformed gamma family, 69 
Triangular kernel, 318 
Trigamma function, 588 
Truncation 

from above, 297 

from below, 297 

left, 297 

right, 297 


U 


Unbiased, 4, 268 
maximum likelihood estimator, 352 
Uniform kernel, 318 
Uniformly minimum variance unbiased 
estimator (UMVUE), 272 
Uniformly most powerful test, 279 


Vv 


Vague prior distribution, 361 
Variable-component mixture, 44 
Variance, 27, 522 

conditional, 521 

delta method, 356 

Greenwood’s approximation, 311 

product-limit estimator, 311 
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Waring distribution, 103, 379 
Weibull distribution, 48, 58, 637 
Weiner process, 253 

White noise process, 253 


Y 
Yule distribution, 103, 379 
Z 


Zero-modified distribution, 85, 647 
Zero-truncated binomial distribution, 647 
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Zero-truncated distribution, 85 Zero-truncated negative binomial 
distribution, 647 
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